1. Risk-based control in Spain: A foundation for improved analytics

Abstract

This chapter provides an overview of the General Comptroller of the State Administration (Intervención General de la Administración del Estado, IGAE) and its oversight of public grants and subsidies in Spain. It describes the IGAE’s current approach to risk-based planning, and highlights preconditions and considerations for the IGAE to advance its use of grant data for assessing fraud risks. This includes considerations and recommendations for ensuring effective data governance and data management, as well as building capacity for using machine learning models.

Introduction

The General Comptroller of the State Administration (Intervención General de la Administración del Estado, IGAE) exercises internal control over the economic and financial management of the Spanish government. This includes the central government, dependent autonomous bodies in the central administration, state entities under public law, and public business entities. As part of its mandate, the IGAE carries out control activities to ensure sound financial management and compliance with, inter alia, the Organic Law of Budgetary Stability and Financial Sustainability (La Ley Orgánica de Estabilidad Presupuestaria y Sostenibilidad Financiera), the General Law of Grants (General de Subvenciones) and legislation of the European Union (OECD, 2014[1]). The IGAE also investigates high-risk areas for potential fraud and irregularities, including public grants and subsidies that support the achievement of Spain’s public policy goals.1 d

The public grants and subsidies that the IGAE oversees amounts to EUR 89 860 million of the total annual budget, and involves thousands of beneficiaries and entities. Given the size of this audit universe and the high volume of transactions related to grant disbursements, the IGAE has developed a risk-based approach to help it target the highest risks and manage its resources efficiently. The risk criteria the IGAE has developed takes into account the potential for fraud and irregularities based on predetermined criteria, as described in this chapter.

The IGAE has developed a risk-based approach for its control activities, but opportunities remain for it to make better use of existing data and new methodologies to further target its resources to high-risk areas. This chapters explores key considerations for the IGAE to advance its use of data and analytics. As described in Chapter 2, the project focused on a specific methodology, inspired by machine learning; however, the considerations in this chapter are more generally applicable regardless of the technique or methodology. Moreover, while the OECD project focused on enhancing detection of grant fraud risks, the insights from this chapter and the next are applicable to other types of risk analyses when reliable data were available.

Overview of the grant cycle and the IGAE’s oversight responsibilities

The IGAE follows a decentralised operating model, with three central service functions delivering its core areas of responsibility at the central government level, including the National Audit Office (Oficina Nacional de Auditoría, ONA), the Public Accounts Office (Oficina Nacional de Contabilidad, ONC), and the Office of Finance and Information Technology (Oficina de Informática Presupuestaria, OIP). The IGAE has both ex-ante and ex-post responsibilities:

Ex-ante, by controlling, before they are approved, activities in the performance of expenditures, revenues, payments and investments, or the general application of public funds, to ensure that management complies with all applicable laws. Ex-ante control is therefore preventive, taking place prior to the adoption of various economic activities, such as contracts, grants, agreements, charges and payroll, among others. It can be exercised in a limited fashion, by examining certain key aspects of economic and financial activities, or it can be exercised in full, by examining all documentation linked to a financial act.
Ex-post, by verifying on an ongoing basis the status and operation of public sector entities to verify compliance with applicable regulations and that management conforms to the principles of sound financial management, in particular the achievement of the objective of budgetary and financial stability. The IGAE performs public audits, which can take various forms, including annual accounting regularity audits (reviewing accounting information to verify its relevance to accounting standards), compliance audits (verifying the legality of budget management, procurement, personnel, revenue and grant management) and performance audits (examining operations and procedures to assess financial and economic rationality and relevance to the principles of good governance as a means to detect deficiencies and make recommendations to correct them).

The main results of the IGAE’s audits are summarised in an annual report. When infractions are detected that could result in corruption or fraud, a special report is sent to the Ministry of Finance and Civil Service (Ministerio de Hacienda y Función Pública) in addition to the controlled entity. This reporting promotes improvements over time in the techniques and procedures of economic and financial management as recommendations are acted upon. There are collaboration mechanisms between the IGAE, comptrollers of the autonomous communities and local comptrollers (OECD, 2014[1]).

The mandate to control grants rests predominantly with the ONA and the IGAE’s Grants Monitoring and Reporting Division (División de Control e Información de Subvenciones). However, there are “Delegated Interventions” (Intervenciones Delegadas) at regional or provincial levels, as well as those integrated into ministries and public sector organisations. These entities act as financial controllers and are responsible for ongoing monitoring of financial controls and public internal audits (IGAE, 2020[2]). In addition, Delegated Interventions are tasked with exercising control on public expenditures to third-party organisations, including public grants, loans, and guarantees.

Article 140.2 of Law 47/2003 General Budget (General Presupuestaria) gives the IGAE the power to execute internal control of the public sector with full autonomy vis-à-vis the authorities and other entities whose management it controls (Government of Spain, 2003[3]). This power includes the authority to implement control activities related to beneficiaries of grants, in accordance with articles 141 of the 140.2 of Law 47/2003 (Government of Spain, 2003[3])and 44 of Law 38/2003 (Government of Spain, 2003[4]) General Grants (General de Subvenciones) (IGAE, 2020[5]).

The administration or granting body oversees the general processes of each phase of the grant cycle, and the granting body is responsible for oversight of the beneficiary to ensure compliance with the terms of the grant. For instance, early in the grant cycle, concerns addressed can include whether the grant-awarding agency has correctly generated the grant, and if the grant was awarded, applied and checked accurately. In addition to the oversight by the awarding agency, external bodies, such as the legislature, court of auditors, or other audit bodies, provide additional oversight and controls. The IGAE’s model for overseeing the public grants spans the grant cycle, which generally consists of the following phases:

1. Competition—The conditions a grant beneficiary must meet in order to receive a grant are defined by the awarding agency. The grant-awarding agency approves these conditions, and a request for applications is opened.
2. Selection—Candidates are reviewed and selected based on the quality of their applications against the original set of criteria.
3. Grant execution—If the applicant already meets all of the requirements for the grant, or the grant has a provision for an advance, a payment is then made and the beneficiary must begin the activities required by the grant immediately.
4. Monitoring—After the requirements of the grant are delivered, the beneficiary must present a justification of how funds were spent. The granting agency will review this justification and adjudicate whether any final payments need to be made or if funds need to be clawed back. The latter can take place if an activity was not completed as stipulated by the initial grant.

The IGAE’s investigations and control activities across the grant cycle serve different purposes. For instance, they aim to verify whether the beneficiary obtained and is managing the subsidy correctly. The IGAE may also assess whether the subsidy was justified, and that the operations covered by the subsidy are legitimate and real. The IGAE will also investigate whether the beneficiary had failed to report material facts to the administration that could affect the financing of the subsidy. (IGAE, 2020[5])

Transparency is emphasised in laws and in practice. For instance, Royal Decree 130/2019 (Government of Spain, 2019[6]) integrates many of the aforementioned laws and reiterates the provisions concerning transparency, access to public information and good governance. This law, along with EU 651/2014 (European Union, 2014[7]) and 702/2014 (European Union, 2014[8]) dictate that data on these grants and their disbursement must be published publicly on the National System of Publicity for Subsidies and Public Aid (Sistema Nacional de Publicidad de Subvenciones y Ayudas Públicas, SNPSAP) each year (Ministerio de Hacienda y Función Pública, IGAE, 2021[9]).

The IGAE’s approach to risk-based planning

The IGAE prepares a plan each year on which ex-ante and ex-post controls will be tested. This plan is based on which controls address higher risks and which contribute most effectively to the advancement of the body’s four overarching goals. These goals include: 1) combat fraud; 2) increase awareness of control activities among grantees and granting bodies; 3) seek additional value in the control beyond simple verification or repetition; and 4) account for the principles of decentralisation by using all resources, media and tools available for control activities. Previously selected testing which was incomplete from the previous year will also carry-over into the yearly plan. The IGAE’s annual plans are subject to change throughout the year if new unforeseen risks emerge (IGAE, 2020[5]). For instance, in 2021, the IGAE selected two controls to evaluate: one of which is that no disqualified entities have been awarded a subsidy, and the other being that no grants have been awarded that exceed the European Commission’s regulatory thresholds (IGAE, 2020[5]).

The IGAE typically plans its control activities based on the following analysis of the National Subsidies Database (Base de Datos Nacional de Subvenciones, BDNS), which is a database that has information on all national grants and their recipients and is under the management of the IGAE. The IGAE also relies on CincoNet and Presya, which are Spain’s accounting system and loan accounting systems, respectively, as well as complaints. Examples of sources of complaints are the State Agency of Tax Administration (Agencia Estatal de Administración Tributaria, AEAT), individual whistle-blowers, granting organisations, and money laundering investigators. Information on beneficial ownership is available from a variety of sources (e.g. a database of the General Council of Notaries, el Consejo General del Notariado), and in the future, the Ministry of Justice (el Ministerio de Justicia) is developing a Registry of Beneficial Ownership in the future that consolidates different sources. Experience from previous years and contextual knowledge also help the IGAE to determine which areas are high risk and have control weaknesses.

The IGAE’s goals, priorities and resource limitations are also considered when planning activities for the year ahead (IGAE, 2020[5]). To promote the efficient use of its resources, the IGAE adopted a risk-based approach that it describes in its 2021 Financial Audit and Control Plan of Subsidies and Public Aid. Highlighting international frameworks, including those of the Committee of the Sponsoring Organisation of the Treadway Commission (COSO) and the European Commission’s Anti-Fraud Strategy, the plan outlines the IGAE’s three main considerations:

1. Grants with the highest perceived risk—as risk indicators, the IGAE considers the amount granted, the level of fraud noted in previous years, characteristics of the grant calls and of the granting, justification and verification procedures.
2. The visibility of the control—the IGAE considers the visibility and impact of the control activity, recognising that high-visibility activities can act as a deterrent (i.e. beneficiaries and other stakeholders are more aware of the IGAE’s surveillance) and they can lead to better management.
3. The “profitability of the available means”—this broadly refers to the IGAE’s consideration of the efficiency of its control activities and the decentralised structure referred to as “Peripheral Services” (los Servicios Periféricos), which includes collaboration with line ministries and departments in regional territories (IGAE, 2020[5]).

In planning and executing its work, the IGAE must follow certain parameters that shape its control activities. Its mandate is limited to the control of subsidies and public aid, including loans, contemplated in Title III of the General Subsidies Law (Ley General de Subvenciones, LGS). In addition, the IGAE’s control activities for 2021 generally focus on 2018 or after, recognising that some grants have multi-year execution periods. The IGAE’s control activities focus primarily on grants and aid financed with national funds, although it is possible that a subsidised action may have also received funding from the European Union (EU) (IGAE, 2020[5]).

IGAE officials highlighted three key areas of fraud risk that are of particular concern in Spain’s public grant-making programmes: 1) over-billing of hours by grantees; 2) double-financing; and 3) excess billing by contractors or third-parties.

There is a risk that grantees bill additional hours over the actual service provided. Organisations receiving grant funding must report back to the granting agency on how many hours of work were completed by staff on the relevant project. This figure has implications for how much funding the grantee receives. However, as many organisations’ operations are only in part funded by grants, there exists a risk that these employee hours incurred as a direct result of grant-related work could be overstated. For example, an organisation could falsely claim that wage costs which would still have been incurred in absentia of any grant, were instead, a direct result of the public funding (see Box 1.1 below for the experience of the U.S. Centre for Medicare and Medicaid Services). The IGAE attempts to control for this risk by mandating work reports on the hours utilised and applying maximum thresholds; however, these approaches can only partially mitigate the risks. IGAE officials highlighted the need for improved data to help detect this type of fraud, including data for wage hours, total company revenue, and typical personnel expenses prior to receiving the grant. This information could be added to the BDNS to support further analysis, officials said. This could include comparison of grantees to their peers to find those inefficiently using labour hours, or before-and-after tests to assess discrepancies between what a firm claims in grant documents and the wages it actually bills.
A second area of concern is that grantees could receive funding from two or more sources, both public and private, at a level that exceeds incurred costs and results in undue profit. The BDNS helps the IGAE to combat this practice, as it includes a list of all national grants given to a single organisation. However, the BDNS does not include grants from the EU. IGAE officials noted that having this additional information on all grants given to an organisation and the total income of each from all sources would be particularly useful in identifying areas of high risk. While this declaration is a requirement for large grantees already, it is not for smaller ones. An expansion of this mandatory disclosure to all grant recipients could be achieved through a self-declaration by the grantee, a web search, or an analysis of the organisation’s financial statements.
Excess billing or outsourcing occurs when a supplier to the grantee overcharges for a particular service or supply, either by charging greater than market value or providing the less than stated amount. This fraud risk is particularly hard to detect, since it is often accompanied by a legitimate paper trail. IGAE officials highlighted the need for leveraging new technologies and data as a means of identifying cases in which this may occur, and to better analyse the environment in which firms and their suppliers operate, the geographical context and the relationships between them. Indeed, analysing relationships can be useful for broader risk analysis, going beyond the analysis of excess billing alone. This can includes relationships between suppliers, beneficiaries, beneficiaries’ subsidiaries or related companies, and granting organisations. These types or relationships can lead to misappropriation, or organisations being granted an excess of funding. IGAE officials noted that the creation of a database which tracks these kinds of relationships would be useful in identifying areas of high fraud risk. See Chapter 2 for an example and further discussion on networking analyses techniques.

Box 1.1. Targeting of overbilling by the U.S. Centre for Medicare and Medicaid Services

When governments fund third parties, one frequent area of risk is the grantee overbilling work hours, whether in error or with malicious intent. In the United States, the Centre for Medicare and Medicaid Services (CMS) uses a predictive analytics system to try and capture overstatements of this nature. The Fraud Prevention System (FPS), uses a variety of data to evaluate a number of metrics, including:

Rules-based differentiators, such as identifying credit cards or accounts that have been associated with fraudulent behaviour in the past.
Anomaly identification, such as flagging beneficiaries that, when compared, bill larger amounts than similar entities.
Predictive analytics, which identifies beneficiaries that have similar characteristics to known bad actors.
Network analysis through which phone numbers and addresses of beneficiaries are compared to those of known bad actors.

By examining these traits and behaviours, the CMS has identified a number of entities with high-risk billing practices. After being flagged by the analytics programme and upon further, more traditional forms, of investigation, a number of entities have been blocked from further billing or from practicing all together. The FPS has allowed the CMS to allocate its resources efficiently and effectively. Ultimately, the programme is estimated to have saved taxpayers over USD 200 million, which is a USD 5 return for every USD 1 of investment in the system.

Source: (Centers for Medicare & Medicaid Services, 2014[10])

Common considerations for using data and analytics to assess risks

The IGAE is primarily a data consumer in that it relies on data inputs from other government entities to conduct its oversight work and assess risks. As discussed, much of this data are captured in the BDNS, but the IGAE also makes use of other sources, such as accounting systems, loan databases and data on complaints. The IGAE also maintains its own records on the result of control activities and sanctioned cases. IGAE officials highlighted quality checks and controls in place that are meant to ensure the reliability of the data it uses. However, while supporting the IGAE to develop the risk methodology described in Chapter 2, the OECD identified areas where the IGAE could take additional steps to enhance its use of data and analytics regardless of the specific technique or methodology. Broadly, as elaborated in this section, this includes: 1) improvements to the IGAE’s data governance and management; 2) further building its capacity for analytics using data and analytics; and 3) taking into account pitfalls concerning advanced forms of risk assessments, such as limitations of using composite risk indicators and biases.

Strengthen data governance and management

Data governance, and more specifically data management, is the cornerstone for effective analytics, including the approach described in Chapter 2. Regardless of the specific methodology, any “data-driven” approach relies on these elements. The model described in Figure 1.1 highlights the values of all organisational, policy and technical aspects for successful data governance.

Figure 1.1. Data governance in the public sector

The data governance model above is relevant from both a whole-of-government and institutional perspective. For audit institutions, data governance and data management are at the forefront of their everyday work. International standards and guidance, particularly those advanced by supreme audit institutions (SAIs), highlight the need for effective data governance to help audit bodies to keep pace with the digitalisation of government and society.2 Government entities beyond SAIs are also tackling the same issues and developing their own data governance framework. For instance, in New Zealand, the lead agency for government-held data (Stats NZ) developed a data governance framework for government that promotes better data management and encourages government to adopt a “whole-of-data life cycle approach.” The framework encourages public officials to think more strategically about the governance, management, quality and accountability of the data they use over the entire data life cycle (i.e. from the design and source of the data to its storing, publication and disposal) (OECD, 2019[11]). In terms of data quality, several guiding principles include:

Relevancy: the extent to which the data meets the needs of the organisation and its stakeholders.
Accuracy and reliability: the degree to which the data correctly and consistently describes the phenomenon being examined.
Timelines and punctuality: the speed at which data can be obtained, and the reliability of this measurement.
Accessibility and clarity: the ease to access, the clarity and the affordability of the data available.
Coherence and comparability: the consistency of the data and the ease with which it can be combined and compared with other data.
Availability of metadata: the ease with which the underlying information about the data, its structure, and attributes can be found or understood (INTOSAI, 2019[12]).

Using data from multiple sources that are prepared independently from one another can lead to an array of challenges for control bodies when applied to fraud risk detection. The IGAE administers the BDNS and uses it for its own risk analysis, but it is not solely responsible for inputting the data into the BDNS. Public bodies, the Local Administration (la Administración Local), administration of autonomous communities, public sector foundations, among others, are all required to provide information to the BDNS. The IGAE does not conduct data reliability assessments on all data. As a data consumer, some of the data quality issues that were apparent in the data the IGAE uses, such as errors or missing values, are the responsibility of the agency that inputs the data. Nonetheless, audit and control bodies have an obligation to test the reliability and validity of data according to international standards, such as those of the International Auditing and Assurance Standards Board (IAASB) or the Committee of the Sponsoring Organisation of the Treadway Commission (COSO). Moreover, Spain’s own standards for collecting audit evidence, such as International Auditing Norm 500 (Norma Internacional de Auditoría 500),3 emphasises the need for auditors to assess reliability, accuracy and completeness of data. Therefore, even though the IGAE may be dependent to some extent on the data governance, management and quality checks of data producers (i.e. government entities or other institutions), it also must take steps to independently assess the data it obtains.

As illustrated in Chapter 2, interpreting and cleansing the data for enhancing the IGAE’s fraud risk model was time consuming and resource intensive. During the course of this process, “quick win” improvements to the IGAE’s data management, such as having a data dictionary that clearly describes data fields or ensuring that unique identifiers are uniformly applied across datasets, became evident. In general, data of poor quality can reflect issues like missing observations, incorrect information or misnamed variables. Any of these concerns could hinder audit or control bodies from conducting meaningful and accurate analyses of risks and controls. For instance, in the IGAE context, missing values in the data, while common, was a major issue identified while working with various databases to develop the risk model. Missing information or data points can reflect errors or simple oversight by the entity that inputted the data, but they can also be due to purposeful omission. Implementing checks and controls to prevent this from occurring could also serve an additional means of detecting and preventing fraud. From a methodological perspective, reliance on data of poor quality could lead to ineffective sampling, for example, meaning that a number of instances of grant fraud could go unnoticed every year. Inaccurate or incomplete data could also negatively bias more advanced techniques, such as the machine learning approach elaborated in Chapter 2, resulting in models with low predictive power and ultimately the inefficient allocation of the IGAE’s resources.

The IGAE could take additional steps to ensure that the data in the systems and sources it uses are reliable. Put in context, confirming that data are reliable means the IGAE would deem it sufficient and appropriate specifically for fraud risk analysis and the methodology it selects. In other words, are the data complete, accurate and truly describe the key concepts under scrutiny? As a data consumer, the IGAE could work with the institutions and organisations that source the data it relies on to address some of the issues described above and in Chapter 2, and ensure the existence of sound internal controls over the data. This includes the policies and procedures that govern data collection, management, storage and use.

Generally, such controls can be categorised in three ways: 1) general controls, 2) application controls, and 3) user controls (United States Government Accountability Office, 2019[13]) General controls apply to the institution’s information systems as a whole, while application controls are those built into the application to make certain that all actions within it are valid, accurate and complete. User controls are those administered by individuals to improve the reliability of the information system. By understanding the controls that are already in place, the IGAE can have better assurance regarding the reliability of the data specifically for assessing fraud risks. Moreover, drawing from the OECD’s experience working with the data the IGAE uses for fraud detection, the IGAE could pay special attention to the following issues when adjudicating the reliability of the data it uses:

Verify the total number of records provided against summary statistics.
Check for missing observations, accounting for all necessary columns or rows.
Confirm that none of the records are duplicated.
Search for dates outside of the desired range.
Search for values that are extreme outliers.

The IGAE can also look at documentation or manuals explaining how the information systems are designed, but in this case it would also need to verify that the way the system is functioning in actuality does indeed adhere to this benchmark. As another check, data could also be traced back to its source material to ensure the two are consistent (United States Government Accountability Office, 2019[13]).

Build capacity for data-driven risk assessments and analytics, particularly competencies for working with large-scale datasets and data visualisation

Data architecture, data infrastructure and capacity for implementation were highlighted by IGAE officials as some of their top priorities for enhancing data use and analytics in general. These areas were the focus of several OECD recommendations for the IGAE and the ONA to strengthen its continuous supervision system, in part, by automating processes for importing data, as well as enhancing efforts to validate and corroborate self-reported data (OECD, 2021[14]). In the context of assessing fraud risks, given the storage and scale of most grants and related datasets the IGAE uses or could access in the future, such as company registry data, the ability of government servers to manage the volume of data in a timely, reliable manner is critical for data extraction. For large datasets of several million records, even basic data cleaning and analytical work can require the use of high-capacity servers. IGAE officials highlighted the need to enhance the IGAE’s data infrastructure. However, for purposes of this project and assessing fraud risks in public grant data, the existing infrastructure is sufficient for more advanced forms of risk analysis, as evidenced by the machine learning methodology described in Chapter 2.

As a more immediate need to implement the said methodology and similar analytics, the IGAE could build its internal digital competencies to manipulate large-scale datasets (i.e. hundreds of thousands or millions of observations) and to implement advanced statistical methods, such as Random Forests, as described in Chapter 2. The pre-processing phase—data creation, extraction, merging and organisation of dataset that comes before the actual analysis—is time-consuming, costly and requires data literacy to process and clean the data. Costs often depend on the quality and openness of government data systems. With some exceptions, the IGAE has the authority to access many databases that can be used for fraud detection, but taking the time to process poor quality data can drive up costs.

In addition to data quality, cost drivers can include the existence of a digitised, centralised and structured grant datasets, as well as the format of storing them and the corresponding ease of extracting the relevant fields. For this project, the OECD supported the IGAE to create a database that can be used for fraud risk analysis, regardless of the methodology used, thereby reducing such costs in the future. However, data, like risks themselves, are not static and they require the right mix of technical skills and risk expertise to be routinely updated. For instance, to further improve its capacity for carrying out data-driven fraud risk assessments, the IGAE could continue building a multi-disciplinary team with expertise in grant operations, fraud risk management, analytics and data visualisation.

The methodology in Chapter 2 made use of open source software (i.e. Python and R). While many audit institutions rely on paid software (e.g. IDEA, ACL, SAS or Stata), there is no one-size-fits all solution and many entities in search of a more robust tool than Excel have developed effective analytics based on open source tools. In general, the objectives of the analysis, as well as the skills and expertise of auditors, will determine which tool is most appropriate. For instance, the Austrian Court of Auditors (ACA) developed a tool to monitor the financial health of Austrian municipalities. The tool operates mainly through the statistics software R and enables criteria-based comparison of municipalities and identification of those that pose the highest financial risk. The ACA found that R software was better equipped for analysing big data than Excel, was less prone to error and the R codes could be readily re-used in future evaluations, with minor adaptations. The learning curve for ACA analysts was significant, according to ACA officials, given the level of detailed technical expertise required. Nonetheless, having in-house expertise in these applications and coding languages has become a standard skillset for many audit institutions that have advanced their analytics capacities in recent years.

The capacity to leverage data and analytics goes hand-in-hand with data visualisation skills. Visualising data in a way that helps users to understand and act on results requires knowledge of data visualisation principles as well as familiarity with, if not expertise in, specialised software that can produce dashboards and facilitates auditors’ understanding of risks (e.g. R Shiny package, or Tableau). IGAE officials highlighted the need for such tools and dashboards to support analyses of the BDNS as one of their top priorities and needs. Currently, the IGAE makes little use of data visualisations to assess grand fraud risks. Network analyses to identify conflicts of interest is one area that lends itself well to visualising risks (see Chapter 2).

Users that have in-depth knowledge about grant processes, available databases and risks are critical for building an effective analytics capacity and data-driven approach to risk assessments. The IGAE has a team with a strong foundation in all these areas, but could invest further in expertise in analytics and data visualisation in order to advance its digital capacities further. Creating, validity testing, and analysing fraud risk models requires both an in-depth understanding of the grant giving and implementing process as well as advanced analytic skills. Specific knowledge about grants and subsidies helps to understand the scope of data and variable definitions, as well as the regulatory framework that governs the grant cycle. These various capacity issues, many of which reflect the needs of the IGAE, highlight the importance of having clear objectives and priorities when developing an analytics capacity.

While new data-driven approaches can be a catalyst for broader change, making effective use of data and analytics requires more than simply introducing new tools, techniques or data sources. Moreover, questions about building an analytics capacity would likely need to account for other aspects of the IGAE’s work beyond the scope of this project. For instance, how the IGAE builds its capacity to enhance its analytics for assessing fraud risks could likely tie into its broader digitalisation strategy, goals and resources for enhancing data architecture and infrastructure, or institutional objectives for more targeted, effective control activities. Box 1.2 describes the experience of the European Union’s Internal Audit Service strategy for enhancing its analytics function by taking an institution-wide approach.

Box 1.2. Developing a strategy for analytics at the European Union’s Internal Audit Service

The European Union’s Internal Audit Services (IAS) has made strides to advance its use of analytics and technology in its investigations and audits over the past few years. This was achieved by, early on, devising and adhering to a strong and cohesive analytics strategy. To begin, the existing Information Technology team carried out an extensive analysis of areas for improvement, including innovations and new technologies that the service could incorporate in its work. IAS also established an internal group to continue this work, including discovering ways in which data and technology could be used on novel engagements, to stay abreast of current best practices, and to make the department more efficient through analytics. To drive this effort, IAS created a long-term strategy around analytics focused on three key areas: 1) developing a robust inventory of knowledge and skills; 2) starting pilot projects; and 3) knowledge sharing. Creating a singular organisation-wide strategy helped the IAS to more effectively plan audits, among other benefits.

Source: (Barrigon, 2020[15])

Beware of pitfalls concerning composite risk indicators as well as biases

While risk-based control is part of the IGAE’s annual plan, selecting audits and investigations based on perceived risks is ultimately aimed at maximising the value-for-money of taxpayer money. It is therefore critical for the IGAE to be mindful of some of the pitfalls inherent in typical approaches to risk assessments, and to reduce the risk of both false positives and false negatives. One of the most frequent ways of creating (composite) risk indicators is based on manually selecting observed features of well-known salient cases and generalising them by applying the same indicators to the full dataset of cases.

This approach suffers from two major drawbacks. First, it causes the so-called selection bias, meaning that particular cases were taken into considerations, with assumption that their characteristics are generalisable to other observations, without any proof that these are typical or representative of all types of fraudulent schemes. Second, such approaches typically fail to take into account the prevalence of selected risk indicators (or red flags) among clean and unknown cases. In other words, they often produce high false positive rates, meaning they often signal fraud risks when there is no fraud. Third, typically such approaches apply a simple averaging of individual red flags to produce a composite score as they lack the understanding of how different indicators coincide with each other or which ones are more important.

While not the only approach, the methodology described in Chapter 2 was selected because it addresses these shortcoming, and as discussed below, it allows the IGAE to work around some of the peculiarities of the data it uses. The machine learning method in Chapter 2 generalises from all past proven cases (i.e. sanctioned cases) to identify which factors influenced the probability of being sanctioned. This approach leads to a single risk score composed of all relevant features in the data, with weights of each feature defined to maximise predictive power. The approach also explicitly addresses the problem of false positives and false negatives, learning from both proven positive (sanctioned) and likely negative cases (non-sanctioned). Nonetheless, no methodology is completely free from the risk of bias or inaccuracies; however, being mindful of these and the inherent tendencies of specific methodologies concerning these issues can help the IGAE in taking an informed approach to strengthening its current risk assessment methodology. Box 1.3 explores further how the IGAE can control for biases in its models, drawing from international leading practices.

Box 1.3. Addressing biases in machine learning models

Machine learning models are trained based on the data that is available, so they themselves can be inherently biased. The author of the algorithm can also amplify these biases further, purposefully or subconsciously. This is of particular concern for auditors or fraud practitioners, for whom objectivity is of the utmost importance. A number of institutions, including audit bodies and think tanks (e.g. the Brookings Institution), have issued guidance about how to audit artificial intelligence and how to check for biases in algorithms to enhance machine learning model. These include, but are not limited to, the following:

Algorithms can be periodically and independently audited. The audit could include evaluating the data collection process, monitoring how the programme works, and checking whether it is fairly evaluating sensitive subgroups.
The programme could be compared to risk-assessments prepared by humans to see if it is actually more effective.
Algorithms can be checked for compliance to non-discrimination laws.
Algorithm operators can make attempts to increase human interaction with the program, striving to ensure the code and metrics being used are understood, and that their relation to key social inequities are being considered.
The operating agency could consider drafting a formal bias impact statement to document its conscious consideration and strategy when managing this challenge.

According to the Brookings Institution, some questions that can be pondered and included in such a statement in order to assess and control for biases include:

What will the automated decision do?
Who is the audience for the algorithm and who will be most affected by it?
Does the organisation have training data to make the correct predictions about the decision?
Is the training data sufficiently diverse and reliable? What is the data lifecycle of the algorithm?
Which groups may be treated unfairly or may be impacted disproportionately by the training processes of the model and ensuing analysis?
How will potential biases be detected?
How and when will the algorithm be tested? Who will be the targets for testing?
What will be the threshold for measuring and correcting for bias in the algorithm?
What are the operator incentives?
What will is to be gained from the development of the algorithm?
What are the potential bad outcomes and how will the organisation become aware of these?
How open (e.g., in code or intent) will the design process of the algorithm be to internal and external stakeholders
What intervention will be taken if it is predicted that there might be bad outcomes associated with the development or deployment of the algorithm?
How are other stakeholders being engaged?
What’s the feedback loop for the algorithm for developers, users and stakeholders?
Is there a role for civil society organisations in the design of the algorithm?
Has diversity been considered in the design and execution?
Will the algorithm have implications for cultural groups and play out differently in cultural contexts?
Is the design team representative enough to capture these nuances and predict the application of the algorithm within different cultural contexts? If not, what steps are being taken to make these scenarios more salient and understandable to designers?
Given the algorithm’s purpose, is the training data sufficiently diverse?
Are there statutory guardrails that organisations should be reviewing to ensure that the algorithm is both legal and ethical?

Source: (Canadian Audit and Accountability Foundation, 2019[16]); (Lee, Resnick and Barton, 2019[17])

Conclusion

The IGAE has developed a solid foundation to advance its use of data and analytics for assessing fraud risks in public grant data. The skills and knowledge it has in-house, particularly with respect to the grant-making processes, existing risks and the intricacies of relevant databases, are key elements of the capacity and expertise needed for effectively assessing grant fraud risks. There is no analytical tool or method that can replace this knowledge or expert judgement. Moreover, by some accounts, on-the-spot checks and internal fraud reporting mechanisms are perceived to be the most effective fraud detection measure, ranking higher than data analytics or data mining (Dozhdeva and Mendez, 2020[18]). Nonetheless, with an increasingly digital government and society, oversight bodies like the IGAE will have to evolve out of necessity as opposed to by choice.

Building on its strong foundation of expertise and knowledge, the IGAE could consider adding capacities for taking advantage of the full potential of existing databases at its disposal, in particular, strengthening its capacity for working with multiple large datasets and visualisation of data. At the same time, the IGAE could continue to improve its data management, and it checks on the quality of data to facilitate the merging of datasets and assessing fraud risk in public grants. These are actions that would help the IGAE to mature from an analytics perspective regardless of whether it decides to adopt the specific methodology in Chapter 2. Advancing its use of data and analytics would help the IGAE not only to collect more predictive insights about the risks in public grant programmes, but also to be more efficient and effective in its use of taxpayer money.

References

[15] Barrigon, F. (2020), “Innovation and digital auditing – the journey of the European Commission’s IAS towards state-of-the-art technologies”, ECA Journal, Vol. 1/2020, pp. 97-101, https://www.eca.europa.eu/Lists/ECADocuments/JOURNAL20_01/JOURNAL20_01.pdf.

[16] Canadian Audit and Accountability Foundation (2019), Artificial Intelligence and Auditing: Overview of Potential Impact on Public Sector Auditors, https://caaf-fcar.ca/en/performance-audit/research-and-methodology/research-highlights/3455-research-highlights-3.

[10] Centers for Medicare & Medicaid Services (2014), Report to Congress, Fraud Prevention System, Second Implementation Year, https://www.cms.gov/About-CMS/Components/CPI/Widgets/Fraud_Prevention_System_2ndYear.pdf (accessed on 13 August 2021).

[18] Dozhdeva, V. and C. Mendez (2020), Is fraud risk management in cohesion policy effective and proporitionate?, https://www.eprc-strath.eu/public/dam/jcr:dbcbcfde-e024-44a0-a11b-b12456ffe0c5/EPRP%20121%20-%20IQ_Net_Thematic%20paper%2047(2).pdf.

[19] European Commission Anti-Fraud Office (OLAF) (2017), Handbook on Reporting on Irregularities in Shared Management, https://www.eu-skladi.si/sl/dokumenti/navodila/handbook-irregularity-reporting-final.pdf (accessed on 13 August 2021).

[7] European Union (2014), Commission Regulation (EU) No 651/2014, https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02014R0651-20210405 (accessed on 13 August 2021).

[8] European Union (2014), Commission Regulation (EU) No 702/2014, https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02014R0702-20201210 (accessed on 13 August 2021).

[6] Government of Spain (2019), Royal Decree 130/2019 (Real Decreto 130/2019), https://www.boe.es/eli/es/rd/2019/03/08/130 (accessed on 13 August 2021).

[4] Government of Spain (2003), Law 38/2003, General Subsidies (Ley 38/2003, de 17 de noviembre, General de Subvenciones), https://www.boe.es/buscar/pdf/2003/BOE-A-2003-20977-consolidado.pdf (accessed on 13 August 2021).

[3] Government of Spain (2003), Law 47/2003, of November 26, General Budgetary, https://www.boe.es/buscar/act.php?id=BOE-A-2003-21614&p=20201231&tn=6.

[2] IGAE (2020), Activity report 2019 (Memoria de actividades 2019), https://www.igae.pap.hacienda.gob.es/sitios/igae/es-ES/QuienesSomos/Documents/Memoria_2019.pdf.

[5] IGAE (2020), Approval Of The Audit And Financial Control Plan Of Subsidies 2021 (Aprueban El Plan De Auditorías Y Control Financiero De Subvenciones 2021), https://www.igae.pap.hacienda.gob.es/sitios/igae/es-ES/Control/CFPyAP/Documents/Resoluci%C3%B3n%20Plan%20Auditor%C3%ADa%20Pbca%20y%20CFP%202021.pdf (accessed on 13 August 2021).

[12] INTOSAI (2019), Training Tool on Environmental Data: Resources and Options for Supreme Audit Institutions, https://www.environmental-auditing.org/media/113693/23g-wgea_environmental-data_2019-fin.pdf (accessed on 13 August 2021).

[17] Lee, N., P. Resnick and G. Barton (2019), “Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms”, Brookings Institute, https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/ (accessed on 13 August 2021).

[9] Ministerio de Hacienda y Función Pública, IGAE (2021), National System of Publicity for Subsidies and Public Aid (Sistema Nacional de Publicidad de Subvenciones y Ayudas Públicas).

[14] OECD (2021), Enhancing Public Accountability in Spain Through Continuous Supervision, OECD Public Governance Reviews, OECD Publishing, Paris, https://doi.org/10.1787/825740cc-en.

[11] OECD (2019), The Path to Becoming a Data-Driven Public Sector, OECD Digital Government Studies, OECD Publishing, Paris, https://dx.doi.org/10.1787/059814a7-en.

[1] OECD (2014), Spain: From Administrative Reform to Continuous Improvement, OECD Public Governance Reviews, OECD Publishing, Paris, https://dx.doi.org/10.1787/9789264210592-en.

[13] United States Government Accountability Office (2019), Assessing Data Reliability, https://www.gao.gov/assets/gao-20-283g.pdf (accessed on 13 August 2021).

Notes

← 1. The European Union defines irregularities as “any infringement of a provision of Community law resulting from an act or omission by an economic operator, which has, or would have, the effect of prejudicing the general budget of the Communities or budgets managed by them, either by reducing or losing revenue accruing from own resources collected directly on behalf of the Communities, or by an unjustified item of expenditure.” Alternatively, fraud is considered to be “in respect to expenditure, any intentional act or omission relating to the use or presentation of false, incorrect or incomplete statements or documents, which has as its effect the misappropriation or wrongful retention of funds from the general budget of the EU or budgets managed by, or on behalf of, the EU, or non-disclosure of information in violation of a specific obligation, with the same effect, or the misapplication of such funds for purposes other than those for which they were originally granted.” (European Commission Anti-Fraud Office (OLAF), 2017[19]).

← 2. For instance, see the African Organisation of Supreme Audit Institutions research report on integrating big data into public sector auditing (https://afrosai-e.org.za/wp-content/uploads/2020/12/Research-Paper-Integrating-Big-Data-in-Public-Sector-Auditing.pdf); the training tool on environmental data published by the INTOSAI Working Group on Environmental Auditing (https://www.environmental-auditing.org/media/113693/23g-wgea_environmental-data_2019-fin.pdf); or the experiences of the Netherlands Court of Audit in developing an audit framework for algorithms (http://intosaijournal.org/developing-an-audit-framework-for-algorithms/).

← 3. International Auditing Norm 500 was adapted from the International Standards on Auditing issued by the International Federation of Accounts through the IAASB.

╳

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

https://doi.org/10.1787/0ea22484-en

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at http://www.oecd.org/termsandconditions.