4. Review of 50 programme evaluations from 28 OECD countries

This section presents the evidence from 50 reviews of published SME and entrepreneurship policy evaluations. It differs from the reviews in the previous chapter because its coverage is highly selective. It is limited to evaluations in OECD countries, published since (OECD, 2007[1]), and it imposes a “quality” criterion on the evaluations that are included. Section 4.1 provides a brief description of how the evaluations were chosen, with full information provided in Annex A. Section 4.2 then presents the big picture findings of the evaluations. Sections 4.3 and 4.4 discuss the policy issues raised and the evaluation approaches used.

Our major criterion for including studies in this review was that they used robust methodologies, so enabling policy makers to place reliance on their findings. Since 2007 there have been a considerable number of evaluations of the impact of SME and entrepreneurship policy that do not meet the Step V and VI requirements of (OECD, 2007[1]) and of this Framework. Our decision was to not include them in this review. Inclusion was therefore determined by passing the OECD threshold, together with a range of other factors set out in detail in Annex A. The ultimate purpose was to reach a balanced conclusion on the effectiveness of SME and entrepreneurship programmes from reliable evaluations, as well as to illustrate good evaluation practice. The criteria set out in Annex A generated 50 evaluations in 28 OECD member countries, aimed at covering the following key SME and entrepreneurship policy areas that we identified for assessment, namely:

  • Finance;

  • Business Advice, Coaching, Mentoring and Counselling;

  • Internationalisation;

  • Innovation;

  • Enterprise Culture and Skills;

  • Inclusive Entrepreneurship;

  • Regional and Local Evaluations;

  • Cluster Policies; and

  • Support in Areas of Disadvantage. 

Where there were multiple high-quality evaluations of a policy area, we have favoured the inclusion of evaluations from a country for which there were no other high-quality studies.

All 50 programme evaluations are documented in full in Annex B across fifteen dimensions. It included as diverse a range of countries as possible in order to avoid the sample being dominated by large countries that have conducted many evaluations. Annex C provides the interested reader with information about a further 25 such evaluations that were considered but not included on at least one of the above grounds.

The “big picture” findings of the evaluations are shown in Table 4.1. The left-hand side of the Table documents the policy results for each evaluation; the right-hand side documents the evaluation coverage and quality. Brief scoring notes and explanations are provided at the foot of the Table, with more comprehensive coverage in Annex B.

Column 1 of Table 4.1 provides the study number, enabling the interested reader to obtain full information on the study from Annex B.

The second column shows the 50 evaluations, covering eight main SME and entrepreneurship policy areas. Almost one third (15) are of different aspects of Finance programmes and there are nine studies of Innovation programmes. A third policy area with several reliable evaluations is Inclusive Entrepreneurship where there were eight.

Despite the ubiquity of policy initiatives providing Soft business support in the form of Business Advice/Coaching/Mentoring/Counselling we were unable to find many evaluations that satisfied our criteria for reliability. There are only 6 reliable evaluations of this kind of support among the 50. We see this as a matter of concern, as was the absence of any reliable Cluster policy evaluations.

A limitation of Table 4.1 is that each programme evaluation is placed in only a single policy area, whereas several cover multiple policy areas. For example, policies to enhance innovation frequently use both public funding and advice meaning they could, in principle, be placed in the policy areas of Finance or Business Advice/Mentoring/Coaching/Counselling (Soft support). Hence placing the evaluated programmes in a single policy area could, potentially, be misleading. To address this, the Framework looks closely at any stated policy objectives and categorises the programmes on what appears to be the dominant focus1. We also favour repetitive evaluation studies of the same intervention on the grounds that policy lessons can be learnt when outcomes differ.

The third column of Table 4.1 provides a verbal description of the 50 evaluation results so, in order to make the findings easier to interpret, we compress them into three groups for further discussion below.

  • Positive Impact. The first group are the evaluations where the findings are either exclusively positive or, where although there are multiple performance metrics, the strong balance of metrics are positive. There are 23 such evaluations and they are defined as Positive.

  • No/Negative Impact. The second group are those in which there was either no evidence of impact according to any metric or where the balance of evidence pointed to a significantly negative effect. These evaluations are defined as No/Negative Impact. There are 6 evaluations in this group.

  • Mixed Impact. The third group are those where impact differs depending on the chosen metric. So, for example, Study 3 finds a positive impact on sales and employment, but no impact on profitability. The 21 evaluations of this type are classified as Mixed.

The overall picture that emerges is of one that is broadly positive but, with just over half of the evaluations pointing to either Mixed or No/Negative Impacts, SME and entrepreneurship policies are some way off being given a clean bill of health.

In part this may be because evaluation outcomes are influenced either by the sophistication of the evaluation as noted in (OECD, 2007[1]), or by the policy area under consideration. We now examine both explanations.

(OECD, 2007[1]) stated that:

“sophisticated evaluations of SME support are, on balance, less likely to provide evidence of policy impact than the evaluations using the less sophisticated approaches”, p50

It would be a matter of real concern if this pattern continued, with the less reliable studies being more likely to point to positive – or negative – impacts. To examine such a link we show our reliability measure – the Evaluation Quality Score (EQS) – alongside outcomes in Table 4.2. The EQS data is reported in the final column of Table 4.2 and is discussed in more detail below.

Reassuringly, this shows that, amongst the 50 high-quality evaluations documented here, outcomes do not seem to be clearly influenced by the EQS. Other implications of EQS are discussed later.

It is, of course, not possible to reach a judgement about whether, amongst the numerous SME and entrepreneurship policy evaluations that did not meet the reliability requirements of this Framework, there continues to be a link between positive estimated outcomes and low evaluation quality.

A second dimension on which policy impact can be reviewed is whether it varies with policy type. This Framework uses a three-way grouping of policy types, distinguishing between Hard, Soft and Both. These are shown for each evaluation in Column 5 of Table 4.1. There are 33 Hard and 11 Soft programmes, with 6 combining Hard and Soft (i.e. Both).

Using this distinction, Table 4.3 assesses whether, for example, Soft programmes are less likely to be classified as having a Positive outcome. Although there are small numbers involved, out of the 6 Evaluations with No/Negative outcomes, 3 out of the 11 Soft programmes were in this category. The comparable figure for Hard policies was 2 out of 34.

Given the difficulty of finding reliable evaluations of Soft programmes to include, this suggests the impact of Soft support continues to be open to valid questioning.

Column 4 of Table 4.1 presents, for each study, the extent to which Objectives and Targets were specified – ideally prior to the programme being implemented. Our scoring system was 1 is when the programme had only general Objectives, 2 when it had indicators close to its Objective, and 3 when this was combined with specific milestones and Target values.

The results, taken from Table 4.1 are very disappointing. Only 1 evaluation out of 50 scored "3" (2%), although 44 scored "2" (88%).

Columns 6 and 7 of Table 4.1 document the scale and duration of the 50 programmes evaluated. It confirms these are generally large-scale and had a lengthy life span. This is to be expected because clearly unsuccessful programmes do not require evaluations to provide evidence of their ineffectiveness. Secondly, as noted in (OECD, 2007[1]), since evaluations have high fixed costs they tend to be focussed on large, rather than small scale policies and programmes. Finally, there may be an element of survivor bias, with only the long-term programmes surviving for long enough to merit an evaluation.

This seems to be supported by Table 4.1. Only 4 evaluations were of short-lived programmes of less than 2 years, although a further 8 were of programmes with a lifespan of 2-3 years. In contrast, there were 13 evaluations of programmes that were both currently ongoing, and which had already had a lengthy lifespan.

Given this diversity it is unsurprising that expenditure varies considerably between the programmes but, perhaps of greatest concern is that in 10 cases it was not possible to determine either from public sources or from those undertaking the evaluation, the sums involved. In the case of small, short-life programmes the sums may have been negligible but in some cases these programmes are on-going and have had a lengthy period of operation.

Ideally expenditure should be linked to impact, so as to be able to comment upon policy effectiveness in terms of a metric such as cost per job created. This would enable more reliable comments to be made on areas of high and low policy effectiveness. Unfortunately, such metrics rarely appear in the vast bulk of the individual evaluations. It is therefore not possible to comment beyond the remarks made in relation to Table 4.3.

The final column of the left-hand side of Table 4.1 seeks to capture the impact of the evaluation in terms of the awareness of the concerned policy makers of its findings, and any changes to policy that took place following the evaluation.

This information was never provided in the published documents consulted and had to be obtained from those undertaking the evaluation. It therefore has all the well-established limitations of self-reported data. Also of concern is that this information could not be obtained in 8 cases.2

Nevertheless, a summary of the impacts documented in Column 8 found that in 17 cases there was a presentation to policymakers and some changes were implemented. In 14 cases there was a presentation made to policymakers but no awareness of changes to the programme being implemented. In 2 cases the results were published or sent to the policymakers, but not presented to them.

Perhaps the most disappointing finding was that in 7 cases the results were never presented to policymakers and the evaluators were also unaware of any policy changes that followed from the evaluation.

Overall, this suggests that in about one-third of cases the evaluation appears to have had an impact in the sense that policymakers were both aware of its findings and changes to the programme were implemented.3 A case may also be made that evaluations were successful if policymakers were aware of their findings, even if no changes were made. On those grounds almost 75% of the evaluations where an outcome has been specified could claim to be successful. However, the reasonable aim should be to achieve 100% amongst reliable studies.

Column 9 of Table 4.1 shows the performance metrics reported for each of the 50 evaluations. In some evaluations, only a single metric is used to judge effectiveness whereas in others up to eight different metrics are used. What emerges is the, almost bewildering, diversity of metrics used by those conducting evaluations of SME and entrepreneurship policies.

Table 4.4 seeks to structure that diversity. It takes only the 12 metrics that are used in more than a single evaluation and shows how, in most cases, their usage varies between the eight policy areas. The two exceptions are Employment, which is used in 28 out of 50 evaluations, and Sales, which is used in 27 evaluations.

The other metrics are used much less frequently and, as Table 4.4 shows, tend to be concentrated in some policy areas, yet absent from others. For example, the crucial metric of Survival is used in only about one-third of the evaluations, most of which are in the policy areas of Innovation and Inclusive Entrepreneurship. The absence of a Survival metric in 14 out of the 15 Finance evaluations has to be a cause for real concern. A similar pattern emerges from the other rows of Table 4.4 with important metrics such as Value Added and Productivity appearing in comparatively few evaluations and, where they are used, being limited to only a few policy areas.

The policy significance of this patchy and inconsistent use of metrics is that it makes it difficult to make informed decisions – even when evaluations have been undertaken – when each evaluation uses different metrics. It will be recalled that the theoretical ideal is for all policies to have the same marginal impact – such as cost per job created – across all policy areas, implying there was no benefit in public funds being transferred from one policy area to another.

However, to make such a judgement requires the same metric – such as cost per job created – to be used across all policy areas. The evidence from Table 4.4 clearly shows that no single metric is consistently used. Even metrics such as Sales or Employment are only used in about half of the evaluations.

Also of concern is that some policy areas seem to have “favourite” metrics which are not used in other policy areas. This makes it impossible for policymakers to assess, on the basis of evaluations, the benefits of shifting funding from one policy area to another.

The evidence from Table 4.4 points to the value of having at least three “common” metrics to be used in all evaluations of SME and entrepreneurship policies and programmes. It suggests these should be Sales, Employment and Survival. These could then be supplemented by others appropriate for the policy area – such as Patents for Innovation evaluations or Wages for Enterprise Culture and Skills or Areas of Disadvantage evaluations.

It was noted earlier that an important limitation of many SME and entrepreneurship policy evaluations was their failure to take full account of the Survival/Non-Survival of enterprises. This is of particular concern because of the low survival rates of SMEs, and of new firms in particular. Evaluations which report changes in the sales or employment amongst recipients only when they are trading therefore risk overestimating the impact of the policy if a large proportion of these firms cease to trade shortly afterwards.

Unfortunately, it appears from column 10 of Table 4.1 that, even amongst this selection of high-quality evaluations, only 15 out of 50 reported taking account of enterprise Survival/Non-Survival.

The final two columns of Table 4.1 present information on the Step Level for each evaluation, together with our more challenging Evaluation Quality Score (EQS).

Using the Six Steps ranking, 43 out of the 50 Evaluations (86%) are ranked at Step VI – the highest possible rank. As noted earlier, the OECD 2007 Framework was only able to identify 6 Step VI studies out of the 41 (15%) that were included. This points to the considerable improvement in the quality – and hence the reliability – of evaluations in this policy area.4

However, this overall improvement in quality has brought with it a recognition that even Step VI evaluations have potentially important limitations. For this reason, Section 4.3.2 sets out the more challenging EQS on which each evaluation is also scored. These outcomes were used earlier in Table 4.2; it showed that 30 out the 50 evaluations scored 4 and 12 scored 5. In most cases the difference between a score of 4 and a score of 5 was that, in the former case, there was either no, or imperfect coverage, of survival/non-survival.

The key lesson is that, for most countries and for most policy areas, there are no longer either technical or data-based reasons for either not conducting evaluations, or for conducting sub-optimal evaluations.  
        

The 50 evaluations therefore constitute a substantial and reliable group upon which to derive conclusions on the effectiveness of SME and entrepreneurship policy and its constituent policy areas. It is clear there have been considerable improvements in both data and analysis since 2007. In the review of 42 evaluations carried out in the 2007 OECD Framework, only 6 would have been of sufficient reliability to merit inclusion in the current review.

References

[1] OECD (2007), OECD Framework for the Evaluation of SME and Entrepreneurship Policies and Programmes, OECD Publishing, Paris, https://doi.org/10.1787/9789264040090-en.

Notes

← 1. In a small number of cases these were also not clearly specified. Here our judgement was based on the focus of the published evaluation.

← 2. This could be a biased sample of in many respects – favouring more recent evaluations or those where memories are more favourable. The reported views on the impact of the evaluation on policy could also be influenced by a desire to seek more work.

← 3. Of course this does not imply that it was the evaluation findings that brought about the change

← 4. For example, in 2007 there were no Randomised Control Trial (RCT) studies to report whereas this Framework includes RCTs from Germany, Chile, Mexico, Netherlands and the United Kingdom.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.