How can we learn when we don’t understand the problem?

written by Salimah Samji

Most development practitioners think that they are working on problems. However, what they often mean by the word ‘problem’ is the ‘lack of a solution’. This leads to designing typical, business as usual interventions, without addressing the actual problem. Essentially, they sell solutions to specific problems they have identified and prioritized instead of solving real and distinct problems.

If the problem identification is flawed, then it does not matter whether you do a gold standard RCT or not, you will neither solve the problem nor learn about what works. Here’s a great example. A recent paper entitled, The permanent input hypothesis: the case of textbooks and (no) student learning in Sierra Leone found that a public program providing textbooks to primary schools had no impact on student performance because the majority of books were stored rather than distributed.

Could they not have learned that the textbooks were being locked up, cheaper and faster, through some routine monitoring or audit process (which could have led to understanding why they were locked up and then perhaps trying to find other ways to improve access to the textbooks – assuming that was their goal)? Was an RCT really necessary? More importantly, what was the problem they were trying to solve? What was their causal model or theory of change? If you provide textbooks to children then learning outcomes will improve?

Interestingly, the context section of the paper mentions that “the civil war severely impacted the country’s education system leading to large-scale devastation of school infrastructure, severe shortages of teachers and teaching materials, overcrowding in many classrooms in safer areas, displacement of teachers, frequent disruptions of schooling, psychological trauma among children, poor learning outcomes, weakened institutional capacity to manage the system, and a serious lack of information and data to plan service provision.” In addition, they also found variance between regions and in one remote council, “less than 50 percent of all schools were considered to be in good condition, with almost 20 percent falling under the category “no roof, walls are heavily damaged, needs complete rehabilitation.”

Honestly, in a complex context like this, it isn’t clear or obvious that providing textbooks would make much difference even if they were handed out to the children, especially since they are written in English. Apparently, the teachers teach in Krio in the early years and then switch to English in Grade 4 and 5. Based on the context above, that sounds more like fiction than fact.

In environments like these, real problems are complex and scary, and it is easier to ignore them than to address them. A possible way forward could be to break the problem down into smaller more manageable pieces using tools like problem trees, the ishikawa diagram and the ‘5 whys.’ Then design an intervention, try, learn, iterate and adapt.

For more watch BSC video deconstructing sticky problems and problem driven sequencing.

The Chief Minister Posed Questions We Couldn’t Answer

Guest post written by Jeffrey Hammer

I was recently at a conference in Lahore, Pakistan sponsored by the International Growth Centre where the keynote address was given by Shahbaz Sharif, the Chief Minister of the province of Punjab, Pakistan (100+ million people). While fun to see old friends and colleagues, the conference was a little depressing in the way it reflected the state of the development economics profession.

The Chief Minister posed serious questions that have traditionally been the bread and butter of the economics profession. Unfortunately, we are not even trying to answer them any more. The specific question was “Should I put more money into transport? Infrastructure (power, roads, water)? Law and order? Social services? Or what? And where am I going to get the money?” What questions could be more solidly part of the core of economics than these? Unfortunately none of these were even remotely the focus of the “evidence-based” policy making discussed.

Almost all of the cases analyzed were  single, simple policy “tweaks” that were, first of all, isolated from the broader market context in which they occurred and, second, had no conception of opportunity cost – what we would have to give up to pursue these things? We had an answer to “how to improve a public food distribution system” but even with a precise answer (to whether a tweak would work) we had no idea whether the substantial amount of money funding such a system is a good idea. Maybe the Chief Minister would be better off improving education or road networks or police or rural electricity. Some of these alternative policies could have more impact on food consumption than food distribution if we thought about how the world worked. Getting food to market securely (roads, better cold storage, trustworthy police and safe roads – this is Pakistan, which no one seemed to notice) may increase food availability much more than any tunnel-visioned food program Or not – maybe the food distribution system is better. We just don’t know. And none of us “experts” are trying to find out.

On spending priorities, what we need is the old fashioned notion of opportunity cost. “Evidence” now is “did something work?” meaning did it have any effect at all? or “can we get it to work a little better?” But the real question in such a resource-constrained economy is “does it work well enough to take money away from the power plant it prevented or any other thing money could have been used for?”  Or even, “is it better than leaving the money in private hands by not collecting the taxes to pay for it?” Besides not knowing the marginal welfare cost of taxation (anyone remember that?), we forget that poor people use their money for food, so the first-order effect of tax revenues is to make poor children hungrier. Is the benefit from secondary education or bicycles or the fertilizer subsidy so good as to impose this cost on these children?  We don’t know who ultimately pays taxes (when wages, for example, respond to indirect taxation) but it is likely that poor people, the majority of the population, pay at least some substantial share. And we don’t know how badly distorted the tax system is – in its very structure, not just in its administration. The incidence and efficiency loss of the whole structure of taxation are the first order answers the Chief Minister needs. No one studies these anymore.

When someone says “we should have more “X” because we have evidence that it works”, the response should be “compared to what?” What should we cut in order to promote your particular interest? My hobby horse these days is more sanitation in South Asia. I should have to defend it against (at least) a few alternatives.

It’s not like we have no basis for making this comparison. We usually try to determine which things the private sector (i.e. almost everyone – farmers, bicycle manufacturers and repairmen, truckers, shopkeepers, halal sausage casing makers) can be safely relied upon to produce, where it goes somewhat wrong (exactly how bad are private schools or doctors?) and where it is a flaming disaster such that the government is utterly indispensable. While we’ve all drawn the gap between public and private costs (or benefits) to help us talk about optimal Pigouvian taxes, when was the last time anyone tried to measure this one, central concept for valuing interventions in developing countries? Or in developed countries, for that matter? We look at enrollment rates (or even learning rates) but never ask “how much is this secondary education worth, and how much of that isn’t captured by the student?” Further, since there is no reason to think the number is the same in any two places, even if there were a couple of such studies, it wouldn’t make up the bulk of what we call policy-relevant research. And it’s not like it’s easy to do so we can’t just say “let the practitioner-types do the (routine) calculations”. There is nothing routine to it at all.

 In the conference, several research projects measured an effect (not an externality, not a welfare loss – just an effect) that could well be part of an almost completely private good with no serious market failures to speak of. Can it really be the case that date exporters genuinely didn’t know that packaging for export was available (and wouldn’t a phone call to either the exporters or the marketing wing of the packaging producers suffice)? Did football producers really need to know a better pattern for cutting pentagons out of leather when mechanized stitching (as the commentator on that paper noted) is swiftly changing the entire production process worldwide? Will the competition that is currently mechanizing allow firms to exist even with the 10% higher profits that a better pattern enables? And are policy makers (even with Ivy League economists as their advisors) really going to make better decisions than those producers or, much more importantly, the competitive forces in the economy?

My defense of my promoting sanitation is that I contrasted the value of health via providing public goods (sewer systems in cities) to spending on publicly provided health care (a rival and excludable service – I’m avoiding the “p” word, this being the sub-continent). I don’t know if I’ve cleanly identified the effect that I purport to have measured – whether open defecation without sewage in slums damages the health of its residents – but it makes sense, is tied to most peoples’ notions of the nature of public and private goods, and gives some evidence of an externality. One reason to avoid specifying which service should be sacrificed is to avoid fights. Even fairly convincing evidence that publicly provided healthcare is of questionable value can provoke uncomfortable arguments. But not even mentioning the opportunity cost of a proposed policy is irresponsible.

On collecting more taxes: this is, of course, a core government activity. Any way we can efficiently get more money into government coffers to support critical public services is to be applauded. But what we were treated to was a two-year experiment on something that looks like tax-farming (and indeed, was titled as such). Higher powered incentives to collect taxes? When you’re being watched?  Tax inspectors didn’t know an experiment was underway? Even if it was double blind (which it was not), can a two-year project using currently recruited tax inspectors (i.e., those that entered public service expecting to get a salary without having to work too much) anticipate what happens in equilibrium when everyone figures out how to make money from these high-powered incentives? That is, core government service or not, there is a labor market in which the people who this experiment purports to study operate. It is the nature of the long-run equilibrium of that market that is the proper level of analysis for policy purposes, not the behavior of the particular individuals who happen to have the job at present. As the commentator on that paper noted, the proposal looked like the medieval version of tax farming. But that scheme always deteriorated in time (longer than a two-year experiment would tell us) into an ugly system that brought down rulers.

The Chief Minister is a committed and capable man. With the recent elections behind him, he has the opportunity to actually accomplish things. He deserves much better support than we’re giving him.

This post originally appeared as a World Bank Blog.

Rigorous Evidence Isn’t

written by Lant Pritchett

Currently, there are many statements floating around in development about the use of “rigorous evidence” in formulating policies and programs. Nearly all of these claims are fatuous. The problem is, rigorous evidence isn’t.

That is, suppose one generates some evidence about the impact of some programmatic or policy intervention in one particular context that is agreed by all to be “rigorous” because it meets methodological criteria for internal validity of its causal claims. But the instant this evidence is used in formulating policy it isn’t rigorous evidence any more.  Evidence would be “rigorous” about predicting the future impact of the adoption of a policy only if the conditions under which the policy was to be implemented were exactly the same in every relevant dimension as that under which the “rigorous” evidence was generated.  But that can never be so because neither economics—nor any other social science—have theoretically sound and empirically validated invariance laws that specify what “exactly the same” conditions would be.

So most uses of rigorous evidence aren’t.  Take, for instance, the justly famous 2007 JPE paper by Ben Olken on the impact of certain types of monitoring on certain types of corruption. According to Google Scholar as of today, this paper has been cited 637 times.  The question is, for how many of the uses of this “rigorous evidence” is it really “rigorous evidence”?  We (well, my assistant) sampled 50 of the citing papers with 57 unique mentions of Olken (2007).  Only 8 of those papers were about Indonesia (Of course even those 8 are only even arguably “rigorous” applications as they might be about different programs or different mechanisms or different contexts.)  47 of the 57 (82%) of the mentions are neither about Indonesia nor even an East Asia or Pacific country—they might be a review of the literature about corruption in general, about another country, or methodological.  We also tracked whether the words “context” or “external validity” appeared within +/- two paragraphs of the mention. In 34 of the 57 (60%) mentions, the evidence was not about Indonesia and did not mention that the results, while “rigorous” for the time, place and programmatic/policy context, have no claim to be rigorous about any other time, place, or programmatic/policy context.

Another justly famous paper, Angrist and Lavy (1999) in the QJE uses regression discontinuity to identify the impact of class size on student achievement in Israel.  This paper has been cited 1244 times.  I looked through the first 150 citations to this paper (which Google Scholar sorts by the number of times the citing paper has itself been cited) and (other than other papers by the authors) not one mentioned Israel  (not that surprisingly, as Israel is a tiny country) in the title or abstract while China, India, Bangladesh, Cambodia, Bolivia, UK, Wales, USA (various states and cities), Kenya and South Africa all figured.  Angrist and Lavy do not, and do not claim to, provide “rigorous” evidence about any of those contexts.

If one is formulating policies or programs for attacking corruption in highway procurement in Peru or reducing class size in secondary school in Thailand, it is impossible to base those policies on “rigorous evidence” as evidence that is rigorous for Indonesia or Israel isn’t rigorous for these other countries.

Now, some might make the argument that formulation of policies or programs in context X should rely exclusively/primarily/preferentially on evidence that is “rigorous” in context Z because at least we know that in context Z in which it was generated the evidence is internally valid.  This is both fatuous and false as a general proposition.

Fatuous in that no one understands the phrase “policy based on rigorous evidence” to mean “policy based on evidence that isn’t rigorous with respect to the actual policy context to which it is being applied (because there are no rigorous claims to external validity) but rather based on evidence that is rigorous in some other context.”  No one understands it that way because that isn’t rigorous evidence.

It is also false as a general proposition.  It is easy to construct plausible empirical examples in which the evidence suggests that the bias from internal validity is much (much) smaller than the bias from external validity as the contextual variation in “true” impact is much larger than the methodological bias from lack of “clean” causal identification of simple methods.  In these instances, better policy is made using “bad” (e.g. not internally valid) evidence from the same context than “rigorous” evidence from another context (e.g. Pritchett and Sandefur 2013).

Sadly perhaps, there is no shortcut around using judgment and wisdom in assessing all of the available evidence in formulating policies and programs.  Slogans like “rigorous evidence” are an abuse, not a use, of social science.