The urgent need for the discipline of evaluation to change paradigm from ‘theory’ to ‘reason’
Updated: Nov 15, 2022
Almost forty years ago Donald Campbell, the most prominent proponent of experimental design in evaluation (the Campbell Collaboration bears his name), wrote a hopeful but rather skeptical paper on the question, Can we be scientific in the applied social sciences? He provided a four four-point strategy in response to that question.
Fund lots of local programs including funds for program staff to do whatever evaluation they want to ‘debug’ them and only later quantify impacts of ‘proud programs’
For these evaluations award multiple contracts rather than do one evaluation and ensure competitive reanalysis of results
Focus on technical appendices written for academics and don’t pretend to do a scientific evaluation on an allocation of funds rather than a coherent program
Avoid measuring outcomes as a tool of administrative control – focus on improvement rather than accountability.
Campbell’s answer to his question was ‘A feeble yes: we can be somewhat more scientific than we are now…An equally feeble no: If we present our resulting improved truth claims as though they were definitive achievements comparable to those in the physical sciences, and thus deserving to override ordinary wisdom when they disagree, we can be socially destructive...I hope that you share my conviction…that we can produce beliefs of enough improved validity and subtlety to make continuing in our profession worthwhile. If you are convinced of both need and possibility, I call upon you vigorous youngsters to take up the task of creating an adequate social theory of validity-increasing applied social science.’ The history of evaluation over the last forty years suggests to me that we have on the whole been unable to follow this advice.
Donald Campbell was not the only great evaluation theorist to have doubts about the validity of experimental design in later life. Carol Weiss, a proponent of experimental design in her early years, along with others such as Peter Rossi and Huey Chen advanced the idea of theory-driven evaluation and the need to test ‘theories of change’. Carol Weiss is often misquoted as developing ‘theory of change’ (see for example this expert panel report on theory of change). But note the plural, it was about the theories not ‘a’ theory. This move was in response to experimental evaluations that described what happened but little in the way of insight about why it happened and how to improve a program. Focusing on the reasons and assumptions was a great advance. But what was actually being proposed had little to do with ‘theory’ in the way we usually understand the term ‘theory’.
It is true that ‘theory’ is used in many ways, including as some have said lower caser ‘t’ theory. Theory can mean different things. It can be:
An idea intended to explain facts or events that is presented as possibly true
A set of principles on which the practice of an activity is based.
An idea used to account for a situation or justify a course of action.
Donaldson & Lipsey (2006) identify three uses of ‘theory’ in evaluation that align with the three definitions
Substantive social science theories on which programs are based
An approach to doing evaluation
A program theory – or reasons to think a program will work
It seems that what we mean by theory in ‘theories of change’ or even ‘theory of change’ relates to the third definition in the sets of dot points above. It has little to do with scientific theory. It is for testing a scientific theory that we do experiments. So, if we are more concerned with reasons and assumptions why do we say we are testing theories or running experimental design in evaluation? It seems evaluation is better suited to the domain of reasoning, argumentation, and propositional logic as it relates to action rather than science.
I suggest that evaluation as a discipline has moved through two phases relating to ‘theory’ and requires a third phase. This may build on the work of Campbell, Weiss and many others and bring evaluation into line with how it is actually practiced, and how it may be valuable to address the major problems confronting humanity.
Phase1: Program as a theory – therefore experimental design used to test the theory (1920s-1970s and still alive today).
Phase 2: Programs as built on theories – therefore evaluation to test the reasons and assumptions or ‘theories of change’. Later this became the mechanisms and contexts that generate outcomes on a realist understanding (1970s-2010s).
Phase 3: Programs as drawing on theories, but better understood as a reasonable proposition for a course of action. Action occurs in an increasingly complex world where knowledge of cause and effect is elusive and where diverse opinions about what is ‘good’ must be included (2010s-).
Why does this matter?
So, what if we use a little ‘creative licence’ to label a reasonable plan of action a ‘theory of change’? I will argue that this important development is an incomplete step and hinders evaluation from making it's full contribution to public policy for at least four reasons.
As a ‘theory’ it immediately sounds like the domain of academics and researchers and may serve to exclude rather than include a broad range of people in discussion about what it is and what it is supposed to do and what makes it a good idea. It obscures the ‘values’ implicit in any specific program design.
As a ‘theory’ it has connotations of something that we put together for the express purpose of it being tested with an experiment. This suggests evaluation is about finding out ‘what works’ in some space and time-invariant manner. It tends to preface quantitative and scientific methods for evaluation (rather than methods for establishing facts) rather than qualitative or participatory approaches to evaluation.
As a ‘theory’ it is suggestive of the general goal of a theory which is to explain the world – this is the domain of science. A search for single answers or general theories about making a change in the world, or even a single program, is very difficult to achieve in a single evaluation study.
In short, ‘theory’ associates a program with a value-free science rather than seeing it as an explicit proposition to do something for some specific outcome that is of value to proponents/ funders. This excludes people from discussion, prefaces the experiment, and does not tend to empower people to engage with design and evaluation.
When we build a bridge, we do not evaluate it by testing the various theories of science on which we rely for it to stand up, we do not test the theories of gravity, torsion, or sedimentation. We evaluate the bridge by how well it performs its intended function, which includes but is not limited to how well it stands up. And if it doesn’t stand up, we don’t conclude that one or more theories on which is based are wrong, but that we haven’t built it properly. Nick Tilley one of the founders of realist evaluation has recently suggested that engineering may be a more relevant model for evaluation than clinical trials. I would extend this idea to a critique of the idea of contemporary social science more broadly or whether evaluation could be a kind of social science. But there is no room here to cover Bhaskar’s realist theory of science or the possibility of naturalism in the social sciences (Bhaskar 1979). Suffice it to say that in evaluation we often use science, but we are not primarily doing science.
But why the urgency?
There is an urgent need to recast evaluation as a pragmatic discipline concerned with ‘reasoned action for the public good’ rather than a scientific discipline for establishing truth.
Evaluation is at risk of being brought into disrepute for have been unable to identify much in the way of ‘what works’ despite 80 odd years of promises. Finding stable cause and effect relationships in complex phenomena has generally eluded us and its only getting harder.
Evaluation is at risk of being an unethical accomplice to the status quo by allowing program proponents to develop a Program Logic or Theory of Change that provides a veneer of scientific respectability to a proposition or program that is wholly inadequate for its expressed purpose. In practice most theories of change are akin to a ‘straetgy on a page’ with a vision (impact), mission (values, reasons, principles) and set of objectives (outputs and outcomes) but little logic or real description of why it will work in the specific context for which it is being considered.
Monitoring is done poorly as it is too often seen to be about collecting data for the eventual measurement of outcomes/ test of a theory. Monitoring if done at all might focus on monitoring outputs or outcomes. It is not usually based on monitoring the key risks to program failure (which might include invalid assumptions). As a result, data is poorly collected by those tasked with the implementation of a monitoring system because they can’t see the point. It is not actually being used as often claimed, to identify risks and for adaptive management.
As evaluators, we would appear to be more concerned with praxeology than epistemology, with reason rather than science. As has been stated, this is not because science is unimportant to evaluation. Social science theories are an important component of program design as they provide one type of reason to accept a program design as valid. However, evaluation is not primarily about testing these theories, but whether we have leveraged them effectively.
Many practicing evaluators already know this. ‘Program logic’ when done reasonably provides the key parts of a proposition for reasoned action (Hawkins 2020). It will often note reasons, assumptions, and external factors. It rarely if ever actually sets out a ‘chain of cause and effect’ (despite often claiming to) and will often display a series of preconditions for moving a system from one condition to another. It is limited only by failing to see its nature as a specific proposition or argument (which may be valid and well-grounded) rather than any kind of theory (which aims to be true) about how we make a change.
If program theory or theory of change is a claim about what will happen when we pursue a particular course of action, and the reasons and assumptions that underpin this claim, then we should test these claims, reasons, and assumptions. An evidence-based program should provide good reasons to expect actions will lead to outcomes in the future (not just provide the history or outcomes of past similar programs somewhere and somewhen). A sound or logical evidence-based program is a proposition that makes sense ‘on paper’ (i.e., the premises and the conclusions form a valid argument) & ‘in reality’ (i.e., the premises and conclusions are well-grounded or can be seen to actually occur once we implement the program). We still need to collect empirical data, but we do it to test a proposition not any kind of theory.
Program evaluation should primarily be about determining the soundness of a proposition before, during and after implementation (Hawkins 2020). Treating programs and their evaluation in this way may lead to better program design and more cost-effective monitoring and evaluation that is focused on managing the risk of failure than attempts at generating general knowledge about ‘what works’.
We cannot afford to continue the charade that answers to all our problems lie in more science rather than more reasonable action. Our survival as a species may rely on our ability to put science to use, in conjunction with our values, to design and evaluate reasonable propositions for the public good.
Bhaskar, R. (2014). The Possibility of Naturalism: A philosophical critique of the contemporary human sciences (4th ed.). Routledge.
Campbell, Donald T. (1984)"Can we be scientific in applied social science?" Evaluation studies review annual 9: 26-48.
Donaldson, S. I., & Lipsey, M. W. (2006). Roles for Theory in Contemporary Evaluation Practice: Developing Practical Knowledge. In The Handbook of Evaluation: Policies, Programs, and Practices (pp. 56–75). SAGE.
Hawkins, A. J. (2020). Program Logic Foundations: Putting the Logic Back into Program Logic. Journal of MultiDisciplinary Evaluation, 16(37).
Tilley, N. (2016). EMMIE and engineering: What works as evidence to improve decisions? Evaluation, 22(3), 304–322.
Weiss, C. H. (1995). Nothing as practical as good theory: Exploring theory-based evaluation for comprehensive community initiatives for children and families. In J. Connell, A. Kubisch, L. Schorr & C. Weiss (Eds.), New approaches to evaluating comprehensive community initiatives (pp. 65–92). New York: The Aspen Roundtable Institute.
 Source: 1, The Britannica Dictionary 2&3 from ‘Theory’ Definitions from Google Oxford Languages