Wednesday, February 13, 2013

My two particular problems with RCTs


Up till now I have tried not to take sides in the debate, when crudely cast as between those "for" and those "against" RCTs (Randomised Control Trials)  I have always thought that there are "horses for courses" and that there is a time and place for RCTs, along with other methods, including non-experimental methods, for evaluating the impact of an intervention. I should also disclose that my first degree included a major and sub-major in psychology, much of which was experimental psychology. Psychologists have spent a lot of time thinking about rigorous experimental methods. Some of you may be familiar with one of the more well known contributors to the wider debates about methodology in the social sciences - Donald T Campbell - a psychologist whose influence has spread far beyond psychology. Twenty years after my first degree, his writings on epistemology subsequently influenced the direction of my PhD, which was not about experimental methods. In fact it was almost the opposite in orientation - the Most Significant Change (MSC) technique was one of its products.

This post has been prompted by my recent reading of two examples of RCT applications, one which has been completed and one which has been considered but not yet implemented. They are probably not exemplars of good practice, but in that respect they may still be useful, because they point to where RCTs should not be used. The completed RCT was of a rural development project in India. The contemplated RCT was on a primary education project in a Pacific nation. Significantly, both were large scale projects covering many districts in India and many schools in the Pacific nation.

Average effects

The first problem I had is with the use of the concept of Average Treatment Effect (ATE) in these two contexts. The India RCT found a statistically significant difference in the reduction in poverty of households involved in a rural development project, when compared to those who had not been involved. I have not queried this conclusion. The sample looked decent in size and the randomisation looked fine. The problem I have is with what was chosen as the "treatment" The treatment was the whole package of interventions provided by the project. This included various modalities of aid (credit, grants, training) in various sectors (agriculture, health, education, local governance and more) It was a classic "integrated rural development project, where a little bit of everything seemed to be on offer, delivered partly according to the designs of the project managers, and partly according to beneficiary plans and preferences. So, in this context, how sensible is it to seek the average effects on households of such a mixed up salad of activities? At best it tells us that if you replicate this particular mix (and God knows how you will do that...) you will be able to deliver the same significant impact on poverty. Assuming that can be done, this must still be about the most inefficient replication strategy available. Much more preferable, would be to find which particular project activities (or combinations thereof) were more effective in reducing poverty, and then to replicate those.

Even the accountability value of the RCT finding was questionable. Where direct assistance is being provided to households a plausible argument could be made that process tracing (by a decent auditor) would provide good enough assurance that assistance was reaching those intended. In other words, pay more attention to the causal "mechanism"

The proposed RCT of the primary education project had similar problems, in terms of its conception of a testable treatment. It proposed comparing the impact of two project "components", by themselves and in combination. However, as in India, each of these project components contained a range of different activities which would be variably made available and variably taken up locally across the project location.

Such projects are commonplace in development aid. Projects focusing on a single intervention, such as immunization or cash transfers are the exception, not the rule. The complex design of most development projects, tacitly if not explicitly, reflects a widespread view that promoting development involves multiple activities, whose specific composition often needs to be localised.

To summarise: It is possible to calculate average treatment effects, but its is questionable how useful that is in the project settings I have described - where there is a substantial diversity of project activities and combinations thereof


Context

Its commonplace amongst social scientists, especially the more qualitatively oriented, to emphasis the importance of context. Context is also important in the use of experimental methods, because it is a potential source of confounding factors, confusing the impact of a independent variable under investigation.

There are two ways of dealing with context. One is by ruling it out e.g. by randomising access to treatment so that historical and contextual influences are the same for intervention and control groups. This was done in both the India and Pacific RCT examples. In India there were significant caste and class variations that could have influenced project outcomes. In the Pacific there were significant ethnic and religious differences. Such diversity often seems to be inherent in large scale development projects.

The result of using this ruling-out strategy is hopefully a rigorous conclusion about the effectiveness of an intervention, that stands on its own, independent of the context. But how useful will that be? Replication of the same or similar project will have to take place in a real location where context will have its effects. How sensible is it to remain intentionally ignorant of those likely effects?

The alternative strategy is to include potentially relevant contextual factors into an analysis. Doing so takes us down the road of a configurational view of causation, embodied in the theory-led approaches of Realist Evaluation and QCA, and also in the use of data mining procedures that are less familiar to evaluators (Davies, 2012).

Evaluation as the default response

In the Pacific project it was even questionable if an evaluation spanning a period of years was the right approach (RCT based or otherwise). Outcomes data, in terms of student participation and performance data will be available on a yearly basis through various institutional monitoring mechanisms. Education is an area where data abounds, relative to many other development sectors, notwithstanding the inevitable quality issues. It could be cheaper, quicker and more useful to  develop and test (annually) predictive models of the outcomes of concern. One can even imagine using crowdsourcing services like Kaggle to do so. As I have argued elsewhere we could benefit by paying more attention to monitoring, relative to evaluation.

In summary, be wary of using RCTs where development interventions are complex and variable, where there are big differences in the context in which they take place, and where an evaluation may not even be the most sensible default option.
 


1 comment:

  1. Hi Rick,

    Excellent view as usual, I used to say that RCT can be relevant when there is only one expected impact which can be approached by one indicator, now I'll add: and one cause only.

    The fact is that the content of the "causal package" is also a true issue in theory-based evaluation and we also need to be clearer on what part of an intervention has the expected impacts and how
    I was recently involved in the evaluation of a regional transportation policy in which there was only one final expected impact (shifting traffic from road to rail) and many potential causes, either in the intervention logic (e.g. level and quality of train services) or outside of it (e.g. Attitude towards mass transportation, cost of journey by cars). The fact that there was only one impact to obtain made it possible for us to concentrate on the causes, which we ranked and defined as sufficient or not, and necessary or not.

    But the work we made here because the conditions were good highlight that in many evaluations, where the intervention contains many instruments and expects several impacts, we tend to concentrate on proving that the changes are at least partially due to the intervention rather than really differentiating the causes.

    Waiting for the next article!

    ReplyDelete