3. Using the evaluation criteria in practice

To support evaluators and those involved in designing or managing interventions in developing evaluations that are helpful and appropriate to different contexts and stakeholders, the following two principles have been developed to guide their use. To avoid that the criteria are applied in ways that are mechanistic – discouraging critical thinking, creativity and ownership of participants – these principles should accompany the criteria whenever they are used (OECD, 2019[1]):

  • Principle One: The criteria should be applied thoughtfully to support high-quality, useful evaluation.

  • Principle Two: Use of the criteria depends on the purpose of the evaluation.

The following section elaborates on these two principles and outlines additional key concepts for working with the criteria, including how to adjust the criteria to specific contexts, how to examine the criteria at different moments in time, and how the criteria relate to one another.

Principle one stresses that the criteria should be used thoughtfully. In practice, this means thinking critically about which criteria are most useful to support high-quality, useful evaluation that will be valuable to the intended users. Box 3.1 provides an example of how the criteria can be applied thoughtfully when evaluating an intervention.

Considering the following six aspects and their related questions will assist evaluators in the thoughtful application of the criteria:

  • Context: What is the context of the intervention itself and how can the criteria be understood in the context of the individual evaluation, the intervention and the stakeholders?

  • Purpose: What is the evaluation trying to achieve and what questions are most useful in pursuing and fulfilling this purpose?

  • Roles and power dynamics: Who are the stakeholders, what are their respective needs and interests? What are the power dynamics between them? Who needs to be involved in deciding which criteria to apply and how to understand them in the local context? This could include questions about ownership and who decides what is evaluated and prioritised.

  • Intervention (evaluand): What type of intervention is being evaluated (a project, policy, strategy, sector)? What is its scope and nature? How direct or indirect are its expected results? What complex systems thinking are at play?

  • Evaluability: Are there any constraints in terms of access, resources and data (including disaggregated data) impacting the evaluation, and how does this affect the criteria?

  • Timing: At which stage of the intervention’s lifecycle will the evaluation be conducted? Has the context in which the intervention is operating changed over time and if so, how? Should these changes be considered during the evaluation? The timing will influence the use of the criteria as well as the source of evidence.

The most important aspect of deciding how to use the criteria is relating them to the aim of the evaluation and its context and then building the evaluation criteria and questions around this purpose. Box 3.2 gives two examples of how the purpose of an evaluation can be defined.

The criteria are not intended to be applied in a standard, fixed way for every intervention or used in a tick-box fashion. Indeed the criteria should be carefully interpreted or understood in relation to the intervention being evaluated. This encourages flexibility and adaptation of the criteria to each individual evaluation. It should be clarified which specific concepts in the criteria will be drawn upon in the evaluation and why.

The purpose of the evaluation should be carefully and clearly defined. Stakeholders involved in the evaluation should be included at this stage to ensure that they understand the goal of the evaluation and how it will be used.

Key questions to look at when determining the purpose of the evaluation include:

  1. 1. What is the demand for an evaluation, who is the target audience and how will they use the findings?

  2. 2. What is feasible given the characteristics and context of the intervention?

  3. 3. What degree of certainty is needed when answering the key questions?

  4. 4. When is the information needed?

  5. 5. What is already known about the intervention and its results? Who has this knowledge and how are they using it?

Quality and ethical standards should inform subsequent thinking about methodological approaches, design, implementation and management of the evaluation process.

When adjusting the criteria to a specific evaluation, it is also important to delve into causality and gauge the extent to which the evaluation will be able to attribute effects to the intervention being assessed. These considerations can help manage stakeholder expectations and determine which criteria will be covered in depth. This is most crucial for effectiveness and impact which is discussed in both sections below, but it is also indirectly applicable to, and may feed into, other criteria. For example, efficiency and sustainability could use the actual (or projected) benefits attributed to the intervention as assessed under effectiveness and impact. An evaluability analysis is a useful tool to check that the purpose is understood in depth.1 Further explanation on how to interpret each criterion is covered in Chapter 4.

The criteria can be applied to different points in time. Each of the criteria can be used to evaluate before, during, or after an intervention. Likewise, they can be assessed during different moments in the intervention’s lifecycle. However, the interpretation of the criteria and sources of evidence may be different at different points in time. For example, before an intervention has taken place, effectiveness and sustainability would be projections; whereas after the intervention, more data will be available from which to draw more solid conclusions.

The criteria – and related evaluation questions – should therefore reflect the following two key aspects related to timing: 1) the point in the lifecycle of the intervention when the evaluation will take place; and 2) the stage of the intervention or point of the result chain on which the evaluation will focus.

The conceptual frame of each criterion does not change if an evaluation is taking place before, during or after an intervention. However, the data and evidence available to assess the criteria and the methods used do change. Evaluators should keep these differences in mind when discussing the (potential) findings with evaluation stakeholders as this may influence the perceived usefulness and credibility of the evaluation. For example, an ex-ante evaluation of sustainability (taking place before the intervention begins) could look at the likelihood of an intervention’s benefits continuing by examining the intervention design and available evidence on the validity of the assumptions about the continuation of expected benefits. After the completion of the intervention, an evaluation of sustainability would look at whether or not the benefits did in fact continue, this time drawing on data and evidence from the intervention’s actual achieved benefits.

When looking back in time, evaluations will have to take into account the context and data available at that time so as to make judgements based on reasonable expectations of what could or should have been done. It would be unfair to judge the actions of past programme designers based on information available today that was not known to them at the time. However, many evaluations have found that available information was not fully utilised (where it could have been), which would have made the intervention more relevant. For example, local people were not sufficiently consulted and involved in the design of the intervention. Such an oversight could reasonably have been expected to be avoided and should therefore be flagged in an evaluation.

Formulating good evaluation questions is a key part of the evaluation process and the use of the six criteria interacts with and supports the process of deciding on evaluation questions.2 The process starts with a reflection on the purpose of the evaluation, how it will be used and by whom. An effective engagement with stakeholders through a well-designed participatory process can help evaluators and managers understand how they will use the evaluation. A deep understanding of the intervention and its context, its objectives and theory of change should complement this discussion with stakeholders. An inception phase can be used to explore these questions and generate (a small number of) key questions that the evaluation needs to address.

Following this initial step, the evaluation criteria then provide a tool for checking from different perspectives to see if anything has been missed, enabling further development and refining of the questions, which are essential to the evaluation design. This makes the process systematic and ensures that the evaluation is comprehensive. It is a crucial part of what constitutes a “better evaluation”. It is highly recommended that this question development phase be undertaken with a clear and coherent overall approach.

The institution commissioning the evaluation may have taken a particular decision to evaluate interventions according to certain criteria. In this instance, the institutional guidance should be followed alongside thoughtful application of the criteria to ensure consistency.

An example of this is the Sida’s evaluation manual, which provides examples of standard questions under each of the criteria (Molund and Schill, 2004[6]). Other examples of how institutions interpret and unpack the criteria are available, such as the guidance and technical notes that have been developed for evaluation managers and evaluators at the World Food Programme (WFP, 2016[7]).

The criteria comprise multiple lenses through which an intervention and its results can be viewed. They are interrelated, in that the concepts underpinning each one help to analyse complementary dimensions of the process of achieving results. For instance, effectiveness and impact look at different levels of the results chain, depending on how the objectives of the intervention have been defined. The two criteria are therefore interrelated along the causal chain.

The criteria often depend on each other. For example, an intervention that was not relevant to the priorities of the beneficiaries is unlikely to have the intended impact (unless there are some rather unusual separate channels of impact). An intervention that is poorly implemented (less effective) is also likely to be less sustainable. On the other hand, it is also possible that an intervention could be highly relevant yet ineffective. Or a highly coherent intervention could be very inefficient due to increased transaction costs. Evaluators should explore and reflect on relationships and synergies between different criteria, including considering if and how they are causally related.

Most evaluations draw conclusions based on the findings for each criterion, as well as an overall conclusion based on all criteria, sometimes using a numerical score to rate performance. In drawing conclusions about the intervention, and depending on the purpose of the assessment, evaluators should look at the full picture and consider how to appropriately weight all of the applied criteria. Criteria may be weighted with some institutions defining a dominant (“knock out”) criterion. If performance is not satisfactory on that criterion, no matter how well other criteria scored, the intervention will be considered unsuccessful (or if ex ante, would not be funded).

More specific linkages between each criterion are discussed in Chapter 4.

As described above, the purpose, priorities, scope and context of the intervention and the evaluation will shape the relative focus on different criteria. Evaluators should consider the relative value each criterion will add. This would include two basic decisions: Is this criterion an important consideration for this evaluation? Is it feasible to answer questions about this criterion?

While users may be tempted to simply apply all six criteria regardless of context, the better approach (i.e. the one that is consistent with the original intent of the criteria and that leads to the highest quality evaluation) is that of deliberately selecting and using the criteria in ways that are appropriate to the evaluation and to the questions that the evaluation is seeking to answer.

To achieve this, ask questions such as:

  • If we could only ask one question about this intervention, what would it be?

  • Which questions are best addressed through an evaluation and which might be addressed through other means (such as a research project, evidence synthesis, monitoring exercise or facilitated learning process)?

  • Are the available data sufficient to provide a satisfying answer to this question? If not, will better or more data be available later?

  • Who has provided input to the list of questions? Are there any important perspectives missing?

  • Do we have sufficient time and resources to adequately address all of the criteria of interest, or will focusing the analysis on just some of the criteria provide more valuable information?

It is important to strike a balance between flexibility (avoiding a mechanistic application of all the criteria) and cherry-picking (selecting only the easiest criteria, or those that are likely to generate positive results) when using the criteria. Notably, one should not shy away from answering critical questions on impact and coherence, even though these questions can be more challenging at times. Some points for consideration are shown in Figure 3.1 below; it should be noted that these are examples and not a comprehensive checklist.

A good knowledge of the stakeholders involved – in both the intervention and the evaluation – can help identify potential tensions between their different interests and priorities when it comes to the design and implementation of the evaluation. Beneficiaries may be most interested in understanding effectiveness (e.g. whether their children’s health is improving through participation in a malnutrition treatment programme) while implementers may be more interested in understanding efficiency, with an eye to scaling up treatment to more families. In most cases, not all potentially interesting questions can be answered in a single evaluation and choices will have to be made. To increase the likelihood that questions that are left out of the evaluation will be covered elsewhere, it is good practice to document the process and outcomes of discussions about prioritising different stakeholder needs and deciding on evaluation questions.

The criteria definitions and this guidance provide a common platform and set of agreed definitions on which to build. However, tailoring them to the institutional context is crucial. Evaluators and evaluation managers often use the criteria on behalf of a development organisation, ministry, or other institution that has its own specific mandate, policy priorities, evaluation policy, standards and guidance – all as decided by their governing bodies.

When considering how to operationalise and apply the criteria, it is important that evaluators and commissioners carefully consider the organisation’s strategic priorities, culture and opportunities as a background to decision making. The way certain terms – such as impact – are used varies, and it is important to pay close attention to the potential for confusion or misinterpretation of the criteria and their intended focus. This will assist evaluators and commissioners as they apply the criteria, maximise the use of the evaluation’s findings and enhance relevance to the intended user’s needs. Managers should encourage discussion between evaluators, commissioners and the target audience of the evaluation to consider how the criteria should be applied and interpreted. Such a process can support the design of credible and timely evaluations that better meet the users’ needs.

Methodological requirements set by an institution may also have a bearing on how the criteria are applied. Evaluators should refer to the specific requirements and guidance of their own organisation or commissioner. Other relevant sources such as the United Nations Evaluation Group (UNEG) guidance or the Evaluation Co-operation Group’s (ECG) good practice standards and the Active Learning Network for Accountability and Performance in Humanitarian Action’s (ALNAP) guidance on evaluation in humanitarian settings are also very useful where applicable.

Along with adapting the criteria to the institutions in which they are being used, the way the criteria are understood and applied will reflect the broader policy context, influencing how evaluation managers, evaluators and stakeholders use the criteria. For the next decade, particularly for evaluators working in international development co-operation, the Sustainable Development Goals (SDGs) and the 2030 Agenda are the single most important overarching policy framework and set of global goals.

Key elements of the 2030 Agenda, include:

  • universal access to the benefits of development

  • inclusiveness, particularly for those at greatest risk of being left behind

  • human rights, gender equality and other equity considerations

  • environmental sustainability, climate change and natural resource management

  • complexity of context and of development interventions

  • synergies among actors engaged in the development process.

This framework influences both interventions and their evaluation, including how the criteria are interpreted as well as the process of evaluation itself (including who is involved in applying the criteria, identifying priority questions). Box 3.3 provides guidance for using the 2030 Agenda to inform national evaluation agendas. Box 3.4 gives an example of how this was done in the German development evaluation system (BMZ, 2020[8]). Similar efforts have been made by several national governments – including Costa Rica, Nigeria, Finland – and non-governmental organisations (NGOs) and provide useful lessons for evaluators and evaluation managers (D’Errico, Geoghe and Piergallini, 2020[9]).

The revised wording of the criteria definitions also reflect these elements in several ways. For example, by giving particular consideration to context and beneficiary perspectives and priorities when looking at relevance, effectiveness and impact; by taking account of equity of results under effectiveness and impact; and by adopting an integrated approach when looking at coherence. The guiding principles also reflect the 2030 Agenda by encouraging an integrated way of thinking.

Evaluators should work in ways that thoughtfully consider differential experiences and impacts by gender, and the way they interact with other forms of discrimination in a specific context (e.g. age, race and ethnicity, social status). Regardless of the intervention, evaluators should consider how power dynamics based on gender intersect and interact with other forms of discrimination to affect the intervention’s implementation and results. This may involve exploring how the political economy and socio-cultural context of interventions influence delivery and the achievement of objectives.

Applying a gender lens can provide evidence for learning and accountability while supporting the achievement of gender equality goals. Practical steps to apply a gender lens to the evaluation criteria include:

  • evaluators, managers, and commissioners working in ways that are inclusive and lead to appropriate participation in decision making, data collection, analysis and sharing of findings

  • considering the extent to which gender interacts with other social barriers to jeopardise equal opportunity in the intervention

  • considering how an intervention interacts with the legislative, economic, political, religious and socio-cultural environment to better interpret different stakeholder experiences and impacts

  • considering socially constructed definitions of masculinity, femininity and any changes to gender dynamics and roles

  • analysis of evaluator skills in gender-sensitive evaluation approaches and experiences of working in different contexts when selecting evaluators.

The following table has been developed to help evaluators to reflect on how they can apply a gender lens to the criteria:

The six criteria are intended to be a complete set that fully reflects all important concepts to be covered in evaluations. If applied thoughtfully and in contextually relevant ways they will be adequate for evaluations across the sustainable development and humanitarian fields.

Nonetheless, in certain contexts, other criteria are used. For instance, in their evaluation policies many institutions will mandate analysis of a particular focus area. In 2020, an evaluation of Italy’s health programmes in Bolivia used nine criteria: relevance, effectiveness, efficiency, impact, sustainability, coherence, added value of Italian co-operation, visibility of Italian co-operation and ownership (Eurecna Spa, 2020[11]). Another example involves applying the criteria to humanitarian situations, where criteria such as appropriateness, coverage and connectedness are highly relevant.3 Additionality is a criterion that is sometimes applied, often in the fields of blended finance, non-sovereign finance and climate finance. Various definitions for additionality, including different types of financial and non-financial additionality, are used.4 Depending on the definition used, additionality may be examined under the criterion of relevance, effectiveness or impact. Others treat it as a distinct cross-cutting criterion.

Users should be cautious when considering whether to add criteria, as this can lead to confusion and make an evaluation too broad (providing a less useful analysis). Having a limited number of criteria is useful for ensuring sufficient depth of analysis and conceptual clarity – a point that was repeatedly made during the consultation process when updating the definitions in 2017-2019.

When using other criteria, it is important to define them. An explanation of why they are being added can help ensure that other people understand how the additional criteria fit with the six described here. To support learning across evaluations, it is critical that the same concepts or elements are assessed under the same criteria.

Regardless of which criteria are used, the core principles and guidance provided here should be applied.

References

[4] ADE (2019), Joint Evaluation of the Integrated Solutions Model in and around Kalobeyei, Turkana, Kenya, UNHCR and Danida, https://um.dk/en/danida-en/results/eval/eval_reports/publicationdisplaypage/?publicationid=dd54bef1-1152-468c-974b-c171fbc2452d (accessed on 11 January 2021).

[12] Bamberger, M., J. Vaessen and E. Raimondo (2015), Dealing With Complexity in Development Evaluation - A Practical Approach, https://www.betterevaluation.org/en/resources/dealing_with_complexity_in_development_evaluation (accessed on 11 January 2021).

[8] BMZ (2020), Evaluation Criteria for German Bilateral Development Co-operation.

[5] Bryld, E. (2019), Evaluation of Sida’s Support to Peacebuilding in Conflict and Post-Conflict Contexts: Somalia Country Report, Sida, https://publikationer.sida.se/contentassets/1396a7eb4f934e6b88e491e665cf57c1/eva2019_5_62214en.pdf (accessed on 11 January 2021).

[9] D’Errico, S., T. Geoghe and I. Piergallini (2020), Evaluation to connect national priorities with the SDGs | Publications Library, IIED, https://pubs.iied.org/17739IIED (accessed on 22 February 2021).

[3] Danida (2019), Evaluation of Water, Sanitation and Environment Programmes in Uganda (1990-2017), Evaluation Department, Ministry of Foreign Affairs of Denmark, http://www.oecd.org/derec/denmark/denmark-1990-2017-wash-environment-uganda.pdf (accessed on 11 January 2021).

[13] Davis, R. (2013), “Planning Evaluability Assessments: A Synthesis of the Literature with Recommendations”, No. 40, DIFD, https://www.gov.uk/government/publications/planning-evaluability-assessments (accessed on 12 January 2021).

[11] Eurecna Spa (2020), Bolivia - Evaluation of Health Initiatives (2009-2020), Italian Ministry of Foreign Affairs and International Cooperation, http://www.oecd.org/derec/italy/evaluation-report-of-health-initiatives-in-Bolivia-2009_2020.pdf (accessed on 11 January 2021).

[2] Helle, E. et al. (2018), From Donors to Partners? Evaluation of Norwegian Support to Strengthen Civil Society in Developing Countries through Norwegian Civil Society Organisations, Norad Norwegian Agency for Development Cooperation, https://www.norad.no/globalassets/filer-2017/evaluering/1.18-from-donor-to-partners/1.18-from-donors-to-partners_main-report.pdf (accessed on 11 January 2021).

[6] Molund, S. and G. Schill (2004), Looking Back, Moving Forward Sida Evaluation Manual, Sida, https://www.oecd.org/derec/sweden/35141712.pdf (accessed on 11 January 2021).

[1] OECD (2019), Better Criteria for Better Evaluation: Revised Evaluation Criteria Definitions and Principles for Use, DAC Network on Development Evaluation, OECD Publishing, Paris, https://www.oecd.org/dac/evaluation/revised-evaluation-criteria-dec-2019.pdf (accessed on 11 January 2021).

[14] OECD (2002), Evaluation and Aid Effectiveness No. 6 - Glossary of Key Terms in Evaluation and Results Based Management (in English, French and Spanish), OECD Publishing, Paris, https://dx.doi.org/10.1787/9789264034921-en-fr.

[10] Ofir, Z. et al. (2016), Briefing: Five considerations for national evaluation agendas informed by the SDGs, IIED, London, https://doi.org/10.3138/cjpe.30.3.02./11.

[7] WFP (2016), Technical Note: Evaluation Methodology, DEQAS, World Food Programme, https://docs.wfp.org/api/documents/704ec01f137d43378a445c7e52dcf324/download/ (accessed on 11 January 2021).

Notes

← 1. Evaluability is the extent to which an activity or a programme can be evaluated in a reliable and credible fashion. Evaluability assessment calls for the early review of a proposed activity in order to ascertain whether its objectives are adequately defined and its results verifiable (OECD, 2002[14]). See also Davis (2013[13]).

← 2. The process of formulating evaluation questions and involving stakeholders, while also taking into account the complexity of the intervention is discussed in more detail by, for example, Bamberger, Vaessen and Raimondo (2015[12]).

← 3. The Active Learning Network for Accountability and Performance in Humanitarian Action (ALNAP) is currently updating its 2006 guidance on using the criteria in humanitarian settings, as a compliment to ALNAP’s comprehensive Evaluation of Humanitarian Action Guide.

← 4. EvalNet’s Working Group on Evaluating Blended Finance supported research on the definitions of additionality and related concepts. Findings will be published in early 2021.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2021

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at http://www.oecd.org/termsandconditions.