Wednesday, April 24, 2024

Case-based Comparative Evaluation (CCE)

 





Ho-hum, yet another evaluation brand being promoted in an already crowded marketplace.FFS... 

Yes, I think this reaction is understandable, but I think there is something here captured under this name (CCE) which has potential value.  I will try to explain...

Many evaluators make use of theories of change, as part of a theory-based approach to evaluation. Many theories of change are described in some type of diagrammatic form. And a typical feature of those diagrams is their convergent nature. That is, they start of with a range of different types of inputs and activities which follow various causal pathways towards a limited number of final outcomes.

This image is almost the complete opposite of what happens in actual practice on the ground. Financial inputs come from a limited number of sources, these become available to a small range of partners who carry out their own range of activities, in a variety of different locations each with their own populations, including those intended and not intended to be affected. This description is of course a simplification, but it applies to many development aid programme designs. The point I'm making here is that this process is not convergent it is divergent!  It seems like the diagrammatic theories of change I have described are a type of Procrustean bed


This blog posting has been most immediately prompted by a report I have just reviewed on potential evaluation strategies for a large national level climate finance strategy. The theory of change describes multiple causal pathways connecting the initial provision of government finance through to four expected types of expected impacts.  With two of these causal pathways alone the number of projects being funded is in the hundreds. The report struggled with the issue of how to measure the expected impacts given the scale and likely diversity of events on the ground. And the corresponding challenge of how to sample those projects. Part of my diagnosis of the problem here was the evaluation team's measurement-focused approach. And the weakness of the conceptual framework i.e. the incapacity of the theory of change to capture the diversity of what was taking place.


Describing the alternative to my client is now my challenge. I think it has two parts. Firstly, one should start at the beginning, where the money becomes available, and then follow the money (and the people responsible) as it gets distributed according to its intended purposes. If things are not happening as expected early on in this process then this affects expectations of what might and might not be observable later on in the form of 'outcomes' or 'impacts'. Put crudely, there is no point trying to observe the impact of something that has not yet been delivered.


Secondly, as money is distributed from a central fund decisions are going to be made about how it should be parcelled out in different amounts for different purposes through different institutions.  Each time that happens the decisions that have been made about how to do this are hopefully not random. Evaluating how those decisions were made may not necessarily be all that useful, because often there will be opaque mini, meso and macro political processes involved. But the announced decisions include some intentionally explicit expectations about the relevant purposes of different allocations. Interviews those responsible for those allocations might also elicit more informal and more current expectations about what might be the short and longer terms effects of some of these allocations, when compared to others.

The point I am emphasising here is that sometimes we can make evaluative judgements not based on any overriding predetermined criteria, but by using a more inductive process, where we compare one option to another. This is an excuse for me to quote Marx (G): 

Friend says to Marx – 'Life is difficult'.

Marx replies to friend – 'Compared to what?'

This type of inductive comparative evaluation doesn't have to be completely free form. It is conceivable for example that we could look at two tranches of government climate finance funding and ask (those with proximate responsibilities for that funding) what difference there might between those blocks of funding in terms of how each might meet each of the OECD criteria (These range in their concerns from the more immediate issues of coherence and effeciency to later concerns with effectiveness and impact). Respondents answers in the form of expectations can be seen as mini theories a.k.a. hypotheses that then may or may not be testable through the gathering of relevant data.  In some cases the evidence might not be in the form of available data, but in the form of systems that could generate such data in a timely and reliable fashion.


Before these questions can be posed the cases that are going to be compared would need to be identified. The 'cases' in this example would be particular blocks of funding. Further along the implementation process the cases could be partners who are receiving funding, or activities that those partners implementing, or communities those activities are directed towards. Given this range of options there is clearly a challenge here, which is like that of sampling problem mentioned above, which is how to select cases for comparison.


The method I have been promoting, for a number of reasons, and which I think will address this challenge, is called hierarchical card sorting (HCS). If we are looking at a particular budget document which distributes funding into different purpose categories we will be faced with the question of which of these categories to compare.  One way forward is to let the respondent decide, especially if they have responsibilities in this area. With HCS the interviewer starts with a request, to someone who is informed about this budget and perhaps responsible for it, which is phrased like this: 'What is the most significant difference between all these budget categories in terms of how they will achieve the objectives of the climate Finance strategy? Please sort the budget categories into two piles according to this difference and then explain it to me". This question can then be reiterated by focusing on each of those two piles in turn and getting the respondent to break them into two smaller sub- piles. And so on... Having identified pairs of types of cases that can be compared the respondent can then be asked for details about their expectations of the cases in one pile versus the other (See FN1).


There is a larger question here of course that also relates to sampling. Who are you going to interview in this way? The suggestion above was 'to follow the money '. In other words, to follow lines of responsibility and interview people about the domains of activity they are responsible for, using HCS as a means of structuring the discussion. There is a strategy choice here between what is known as a breadth-first search versus a depth-first search strategies. From a given point in a flow of funds (and of responsibilities) there can be distributions going in different directions, each of which could all be explored. Following all of these is a form of breadth-first search. Alternatively the focus could be just on one of those developments, and following the subsequent distribution of funding and responsibility further down one (or few) line. This is a form of depth-first search. Which of those search strategies to pursue is probably a matter to be decided by the evaluation client. But may also need to be adaptive, informed by what was found in earlier inquiries.


To be continued....
  
Courtesy Jacky Lieu: Comparison of Breadth-First Search and Depth-First Search: Understanding Their Methods and Uses 

But what about aggregation?

If you followed my suggested strategy, the closer you got to the people whose lives were of  fimal/main concern, the small the segments of all the funding you would be looking at. They would be more comparable than when looking at a larger group, with more customised context specific assessments of impact. But how would you / the evaluation team then be able to make any overall statement about the strategy?

The way forward is to think of performance measurement in slightly different terms, than just using a simple indicator based measure. Imagine a scatter plot, with one dimension X describing relative i.e. ranked expectations of achievement and the other dimension Y describing actual/observed/assessed achievements. The entities in the scatter plot are the groups of cases in the smallest available sub-categories that were developed. Their rank position, relative to each other, is evident  when all the binary assessments of expected performance are generated through the process described above. See here for more on how this is done. The scatter plot can in turn be summarised in at least two different ways: using a measure of correlation (or how achievement relates to expectations) and using Classification Accuracy. Equally importantly, qualitative descriptions can be given of cases that exemplify performance most meeting expectations, and the reverse as well as positive and negative deviants (outliers).

What we could end up with is a tree structure documenting multiple routes to both high and low performance, implemented in varyingly different  contexts (describable at different levels of scale).

To be continued....


PS1: When asking about expected effects of one type of allocation versus another, it may make sense to encourage a focus on more immediately expected effects first, and then later ones. They may be more likely, more easily articulated and more evaluable.

PS2: Hughes-McLure, S. (2022). Follow the money. Environment and Planning A: Economy and Space, 54(7), 1299–1322. https://doi.org/10.1177/0308518X221103267