8. Data and technology governance: fostering trust in the use of data

Stéphan Vincent-Lancrin
OECD
Carlos González-Sancho
OECD

The development of a digital education ecosystem should enable to use data and digital tools to improve the quality, effectiveness, efficiency and equity of education. A key means lies in the use and reuse of data in real time or so to make better informed decisions or evaluate education practices to design new reforms. One of the risks that needs to be mitigated with this approach relates to privacy and data protection. There is in most societies a low level of trust in the use and reuse of data and legitimate discomfort with the possibility of privacy breaches. For that purpose, most countries have enacted robust privacy and data protection laws and policies that cover the handling and sharing of data within education systems.

It is important to distinguish between different types of data to reflect on data governance. Statistical data have long been collected and have their laws and processes. They are generally made available to the public and will not be discussed as such in this chapter. The chapter mainly focuses on two types of data: administrative data, which are collected in the process of delivering education or education programmes; and to a lesser extent, commercial data, notably those which are collected by commercial vendors while students or teachers use educational software in school (or for school).

The digital transformation has also raised awareness about smart technologies that can either make automated decisions or support educators. The emergence of generative artificial intelligence (AI) has made the power of AI visible to all. At the same time, some observers worry that algorithms may be biased and even amplify human biases, even though they also have the potential to limit the interference of human biases in educational decisions. Even though the use of automated systems is very rare in education (and public automated decision-making inexistent), a few countries are in the process of setting new expectations about automated decision-making and AI systems, either through guidelines regarding algorithms and AI or through regulation. The “Opportunities, guidelines and guardrails” presented in this report (OECD, 2023[1]) deal with guidelines around the use of generative AI in education.

The first section introduces the main concepts and practices of privacy and data protection, lays out societal privacy concerns and highlights the need to balance the risks of re-identification of individuals against the value of the collected or shared data. The second section presents a state of the art of countries’ privacy and data protection regulation, how they regulate data protection and privacy through tiered-access policies and sometimes through technology. Sharing data with researchers should be a key dimension of countries’ data governance, which may also consider establishing data spaces leveraging the (process) data that are collected by commercial vendors. Finally, the penultimate section reviews a few country efforts to provide guidelines about the use of automated decision-making and AI systems as an upcoming area of regulation. The conclusion highlights the importance to keep a risk-management approach when it comes to data and algorithm governance.

The data collected by digital administrative tools often contain personal information that identifies students and teachers, either directly or indirectly, and some datasets collected by commercial digital education tools and solutions do as well. A big question of data protection in education is about privacy (and ultimately safety and well-being), and how to balance it with the use and reuse of data within the digital education ecosystem and with research and innovation utilisations which involve sharing de-identified data with third parties.

The concept of personal information is central to modern privacy law. The standard legal approach is that some data elements represent personal information – that is, they make it possible to identify individuals – while others do not. Generally, only personal information falls within the scope of privacy legislation. The distinction between personal and non-personal information is thus the basis for assigning rights and responsibilities to individuals (“data subjects”) and to the entities (“data custodians” or ”data controllers”) collecting and managing data records about these individuals.

From a traditional legal standpoint, therefore, privacy risks appear largely confined to those data elements that are considered personal information. Naturally, context plays a role and any given data element may be considered personal or not depending on whether the circumstances and other available information allow a reasonable inference about the identities of the individuals included in the dataset.

The distinction between personal and non-personal data is a standard feature of privacy frameworks across the OECD area. The OECD Guidelines Governing the Protection of Privacy and Transborder Flows of Personal Data define “personal data” as “any information relating to an identified or identifiable individual (data subject)”. The notion of personal data is also paramount to the EU General Data Protection Regulation (GDPR) that came into force in May 2018 replacing and extending the 1995 EU Data Protection Directive (Directive 95/46/EC). Article 4 of the GDPR defines “personal data” as:

“any information relating to an identified or identifiable natural person (‘data subject’); […] one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”.

In the United States, personal data is referred to with the term “personally identifiable information” (PII). Under the US Family Educational and Privacy Rights Act (FERPA), which sets legal requirements to protect the privacy of education records, PII includes, but is not limited to:

“(a) The student’s name; (b) The name of the student’s parent or other family members; (c) The address of the student or student’s family; (d) A personal identifier, such as the student’s Social Security Number, student number, or biometric record; (e) Other indirect identifiers, such as the student’s date of birth, place of birth, and mother’s maiden name; (f) Other information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person in the school community, who does not have personal knowledge of the relevant circumstances, to identify the student with reasonable certainty; (g) Information requested by a person who the educational agency or institution reasonably believes knows the identity of the student to whom the education record relates” (34 CFR § 99.3).

A basic tenet of privacy regulation is that the protection of natural persons applies to the processing of their personal data, that is, to the operations performed on these data, such as collection, storage, structuring, adaptation, analysis or dissemination. The OECD Privacy Guidelines apply specifically to personal data and establish eight guiding principles for data collection and data use to be respectful of personal privacy (Box 8.1). While originally formulated in 1980, the revision of the guidelines in 2013 reconfirmed the principles as relevant in “an open, interconnected environment in which personal data is increasingly a valuable asset” and where “more extensive and innovative uses of personal data bring greater economic and social benefits, but also increase privacy risks” (OECD, 2013[2]).

It is important to underline that legal frameworks such as the GDPR and FERPA do not prohibit the exchange or processing of personal data but, instead, lay restrictions on it and establish that a legal basis (such as national law) is necessary for specific processing situations. Among those, data use in the public interest, including for scientific, research or statistical purposes, often merit a set of special provisions (e.g. Article 89 of GDPR).

Personal data can be generated in many ways with varying degrees of individual involvement and awareness of the data generation process (Abrams, 2014[4]). First, personal data may be provided or revealed by choice, as through surveys; they can also be revealed though compulsory disclosure, as when set as a pre-condition to receiving services, as in the case of school registration. Second, they may be created without full consent or awareness from the data subject, as in the form of data traces from online tracking or sensor observation (e.g. (Buckley et al., 2021[5])). In addition, personal data can increasingly be derived or inferred from other existing data, either mechanically or by probabilistic means. In the context of administrative collections, data subjects normally retain visibility and control over the information that concerns them. More and more, however, individuals also create and share, either consciously or inadvertently, personal information about themselves and others (e.g. schoolmates, teachers) in platforms such as social networks, photo-sharing sites or rating systems. These platforms are often part of larger application ecosystems that can access personal data from multiple online services (e.g. contact lists through mobile phone applications), making it possible not only to combine users’ information but also to infer “shadow profiles” of non-users, which highlights the collective aspects of privacy in the new digital environments (García et al., 2018[6]).

Legal definitions reflect these complexities by underscoring that personal data may either identify individuals in a forthwith manner, or enable their identification through other, less straightforward means. This relates to the standard distinction between “direct” and “indirect” (or “quasi”) identifiers, both of which fall under the broader concept of personal data.

Direct identifiers are data elements that provide an explicit link to a data subject and can readily identify an individual. For students and their families, examples of direct identifiers include names, addresses, social security or other administrative numbers or codes, unique education-based identification numbers or codes, photographs, fingerprints, or other biometric records. Other forms of data generated digitally within or outside educational settings can also apply. For instance, under the EU GDPR, data elements such as an email address, location data from mobile phones or an Internet Protocol (IP) address can also be considered personal data. For teachers and other school staff, most of the same variables can serve as direct identifiers, as well as other data elements such as teaching assignments or scores on professional evaluations.

In turn, indirect or quasi-identifiers are data elements that, despite not being unique to a particular data subject, may still be used to identify particular individuals, normally in combination with other available information. Examples of indirect student identifiers include postal codes or other location information, gender, racial or ethnic identity, date and place of birth, grade level, course enrolment, participation in specific educational programmes, or information about transfers between schools or institutions. For teachers, quasi-identifiers can also include marital status, income, information on credentials, certification and training, tenure status or teaching assignments, among others.

There is a grey area around the identifiability potential of other types of education data. For instance, data about school menus or about textbooks and other learning materials assigned to students in particular courses may not contain any personal information. However, combined with other data elements, such information may serve to identify individual students or educators.

Moreover, when the possibility of linking student and teacher records exist, student-level information may easily become an indirect identifier for an individual teacher, and vice versa. This is particularly important for longitudinal information systems that maintain information on both students and educators; for such systems, any detailed inventory of personal data elements requiring privacy protection should include elements relating to both categories of data subjects and potential linkages between them (NCES, 2011[7]).

The conceptual distinction between personal and non-personal data informs the idea that suppressing personal information from a given dataset is an effective strategy to eliminate privacy risks. Within this view, privacy protection takes aim at the data elements themselves and relies chiefly on the de-identification of individual-level personal records. However, the notion of a meaningful divide between personal and non-personal information based on their ‘identifiability’ potential is increasingly put into question.

A growing consensus exists among privacy experts that advances in data availability and analytics bring about a gigantic leap in the capacity to relate seemingly non-personal data to identified or identifiable individuals in a variety of contexts, thereby multiplying opportunities for (re-)identifying data subjects (for an overview, (National Academies of Sciences, 2017[8]). This possibility challenges regulatory approaches that establish privacy rights and restrictions to data use based on the distinction between personal and non-personal information.

Some examples illustrate how developments in data collection and analytics have transformed the playing field by increasing the capacity to infer sensitive information from seemingly safe data. In a famous study, (Sweeney, 1997[9]) was able to identify the medical records of a state governor in the United States by matching information on date of birth, postal code and gender found in publicly released voter registration records and medical encounter data.1 Researchers also managed to re-identify Netflix subscribers by matching their rating records and reviews on the Internet Movie Database (IMDb) based on movie titles and release dates (Narayanan and Shmatikov, 2008[10]). Another example is the use, by retail company Target, of historical purchase data to infer the likelihood that female customers were in early stages of a pregnancy and subsequently adapt marketing offers (Duhigg, 2012[11]). Analyses of credit card transactions and location-temporal data for more than 1 million individuals have also shown that only four data points may be needed to uniquely identify about 90% of the data subjects within these large datasets (de Montjoye et al., 2015[12]) (de Montjoye et al., 2013[13]). Rocher, Hendrickx and de Montjoye (2019[14]) review further examples of successful re-identification in purportedly anonymous datasets and present a statistical model to quantify the likelihood of success of a re-identification attempt on heavily incomplete datasets. These examples show how, despite the application of thorough de-identification and sampling procedures, re-identification remains possible in high-dimensional datasets that provide information on a large number of attributes per individual. Moreover, inferences about sensitive personal traits and about the presence of records from a given individual are also possible through attacks on aggregate data (for a review, (Dwork et al., 2017[15]).

These developments yield a growing sense of unease about the capacity of traditional privacy protection approaches in the era of Big Data. As put by the President’s Council of Advisors on Science and Technology in the United States,

“Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data.  In general, as the size and diversity of available data grows, the likelihood of being able to re‐identify individuals (that is, re‐associate their records with their names) grows substantially. While anonymization may remain somewhat useful as an added safeguard in some situations, approaches that deem it, by itself, a sufficient safeguard need updating” (Executive Office of the President, 2014, p. xi[16]).

Administrative education data are not exempt from these perils. A student or teacher may appear in an external dataset that contains some uniquely identifying information that has been removed from duly anonymised education records. However, if the two data sources can be matched on some other shared or overlapping variables, then the matching brings the possibility of identifying the student or teacher in the education dataset as well as of expanding the amount of information available about them. For example, a student’s school records may be linked to data from a university financial aid service, or to responses to a survey on labour market outcomes. The student’s social security number may appear in the latter sources only, but birth dates may be present in all sources. Through the linkage, the school records could then be attributed to particular social security number holders. In the same vein, the student’s academic record could be linked to health or crime data collected by other public agencies using any shared data elements.

In this example, uncertainty may exist about the identity of students sharing the same birth date. However, auxiliary information from additional shared variables (e.g. gender, postal codes) could also be brought in to establish the matching of pairs of students born on the same day between the different datasets. Overlapping elements that univocally identify individuals allow direct and exact matching, also known as deterministic record linkage; by contrast, when uncertainty remains about the uniqueness of the data records, the matching is still possible through probabilistic record linkage techniques.

Additionally, re-identification of a student or teacher may occur through the combination of education data records with information from other types of sources, such as events reported in the media. For instance, a student’s school records may reveal that disciplinary action was taken for involvement in a bullying incident, without further specifying the identity of other students involved or the presumed motivations of the event. However, local media reports of the incident, even without revealing the names of the students, may contain additional information about the event. The date of the incident or the reported age of the students could then be used by someone with access to both sources to re-identify the involved students.

While linkages to external sources may provide opportunities for re-identification, it is important to note that identity attribution can also occur through the combination of data elements within a given education dataset, or by inferring the identity of an individual in a given category from aggregated published statistics. These instances are of particular relevance for students and teachers in population sub-groups represented in low numbers in a given context. For instance, a demographic breakdown of the results of students from a given school or municipality in a public examination may be combined with information on grade levels, gender and ethnicity to identify individual students with a set of observable characteristics and specific academic outcomes.

These examples show how the relative prevalence of indirect personal identifiers is positively associated to the likelihood of re-identification. Highly uncommon personal characteristics of students or teachers can more easily lead to disclosing their individual identities than traits more widely shared in the population. A student’s special education status, a teacher’s unusual certifications or, more generally, postal codes of residents in sparsely populated areas are examples of variables that can be highly informative about individual identities within a given dataset.

Re-identification risks are also dynamic and cumulative. In the case of administrative education data, risk can be associated, most importantly, to prior releases of other education records as well as to the information available through other sources. Generally, as the amount and diversity of data grow, possibilities for re-identifying students and teachers using their education records increase exponentially.

Fears that personal data may be accessed without authorisation or used inappropriately have become acute and widespread in recent years. This provides the backdrop for the concern that the release of education records, even when duly anonymised, poses risks for students and teachers.

Inadequate protection against cyber-attacks in an extended concern, as these may lead to a data breach and a disclosure of personal information. There is evidence that such incidents have increased in scale and profile in recent years. According to a survey commissioned by the UK government, 46% of businesses in the country identified at least one cyber security attack in 2016, with the incidence rising to 68% among large firms (UK Department for Digital, Culture, Media & Sport, 2017). Government agencies have also suffered large data breaches: examples include the theft of more than 21 million records from the US Office of Personnel Management in 2015, a leak at the Japanese Pension Service affecting more than 1 million people in 2015, and the loss of a portable hard drive containing personal information about almost 600 000 student loan recipients in Canada in 2012 (OECD, 2016[17]); (Office of the Privacy Commissioner of Canada, 2015[18]). Overall, the scale of data breaches, including for government records, has been growing over the last decade (Information is Beautiful, 2019[19]). However, while this type of events contribute to the perception of a low level of data security in digital environments, research on the frequency of cybercrime suggests that actual risks are often over-estimated. Despite the proliferation of cyber-attacks, once their frequency is expressed as a proportion of the number and size of Internet-related activities, the evidence points to an improvement rather a deterioration of online security (Jardine, 2015[20]). The same could probably be said as a proportion of the amount of available digital data.

Inappropriate use of personal data is also a widespread worry. In the European Union, about half (46%) of respondents to a 2019 Eurobarometer on cyber-security expressed concerns about someone misusing their personal data (European Commission, 2019[21]). In the United States, in 2023, 34% of respondents surveyed by the Pew Research Centre reported they had been the target of some form of data break or hacking in the past year. They also expressed mistrust in the use of their personal data by the government (71% were “very” or “somewhat” concerned about how the government uses these data) and by companies (67% said they understand little to nothing about what companies do with their personal data) (Pew Research Center, 2023[22]). A previous 2019 survey found that most felt that the potential risks of those data collections outweighed the benefits (81% for data collected by companies and 66% for data collected by the government) (Pew Research Center, 2019[23]).

In education, concerns that data may be used inappropriately, especially for commercial purposes, often stem from the increasing involvement of private technology companies in the operations of schools and universities. The sharing of administrative student and teacher data with technology vendors is often required to enable service provision for educational organisations, from administration (e.g. scheduling) to digital learning (e.g. instructional software and content, data dashboards) or testing (e.g. computer-based assessments). Local education authorities and schools often lack the time and technical expertise required to manage their expanding databases, and have thus to rely on third-party operators offering cloud-based computing and other online solutions. Even when privacy laws place third parties under the same obligations regarding privacy safeguards as these organisations, letting private companies manage personal education data remains a controversial topic (Polonetski and Jerome, 2014[24]).

The growing involvement of technology providers in schools is well illustrated by the case of Google. More than 20 million Google-powered Chromebook laptops have been deployed in schools globally since their launch in 2011 while, as of January 2021, the company subscribed 150 million students and teachers worldwide to its G Suite for Education (introduced in 2010 as Google Apps for Education), up from 70 million in 2017.2 The suite provides solutions for emailing, document management and networking, and is complemented by education-specific applications such as the virtual reality Google Expeditions tool. In the United States, the company pledges full compliance with FERPA and other privacy protection regulations which prohibit school service providers from utilising student information for targeted advertising. In line with these requirements, ads are turned off when users in primary and secondary schools are signed in to their G Suite accounts.3 However, concerns remain about tracking and targeted advertising when users transition to applications external to the suite, amid broader claims that the presence of technology companies in schools increasingly exposes children to targeted marketing and other commercial practices (Boninger and Molnar, 2016[25]) (Singer, 2017[26]).

Similarly, there are concerns about the ability of providers of virtual learning environments and Massive Open Online Courses (MOOCs) to eschew the student privacy protections that prevail in school and university settings by collecting data directly from learners, often under a privacy regime that applies instead to general commercial transactions (Zeide and Nissenbaum, 2018[27]).

A related worry is that the introduction of behaviour-monitoring and biometric identification technologies expands the typology of personal data collected in schools. In the United States, this has led a growing number of jurisdictions to introduce new legislation targeting industry and extending privacy safeguards to data other than traditional administrative student records. The Student Online Personal Information Protection Act (SOPIPA) passed in California in September 2014 was the first to put responsibility for protecting student data directly on industry by expressly prohibiting technology service providers from selling student data or creating student profiles for non-educational purposes. SOPIPA has since served as a model for the introduction of similar legislation in other states (Singer, 2014[28]) (Data Quality Campaign, 2017[29]). More generally, the Children’s Online Privacy Protection Act (COPPA) of 1998 prohibits the collection, use and dissemination of personal information from children under the age of 13 without informed, advance parental consent.

In the European Union, the GDPR and the revised Audiovisual Media Services Directive (AVMSD) include special provisions for the protection of minors’ personal data, particularly to prevent data processing for commercial purposes such as direct marketing, profiling and behaviourally-targeted advertising (Ronchi and Robinson, 2019[30]). Regulations of commercial uses of personal education data have tightened with the framework of the GDPR: under the new rules, digital service providers are considered data ‘processors’ while students and schools remain data ‘controllers’ and retain legal control over their data and decision power over third-party data requests (Articles 4-6, 35).4

A privacy breach can expose students and teachers to different types of harms, most worryingly to discriminatory practices. Harms resulting from a privacy breach can be objective or subjective, and involve economic, legal, psycho-emotional or reputational dimensions.5 The potential to cause harm is commonly used as a major criterion in determining the sensitivity of a data element and the need of applying privacy controls. However, delineating the legitimate boundaries for (harmless) information disclosure is often difficult as some uses of data that may risk to invade privacy can also have a social value (Solove, 2006[31]).

In addition to commercial marketing, potential harms that may arise from the misuse of student and teacher personal information include profiling and discrimination, identity theft, or emotional distress. For schools and universities with responsibilities in managing personal records, confidentiality breaches could bring reputational costs, the burden of investigating and remediating incidents, and associated financial losses.

The risk of profiling is of special relevance given the comprehensive and longitudinal nature of some administrative education datasets. Profiling refers to the use of data with the purpose of analysing personal characteristics or behavioural patterns, placing an individual or group of individuals in categories (profiles) and making predictions about their capacities, preferences or behaviours. Combined with automated decision-making, there is a risk that it enables machines to make decisions without human involvement and based on profiles derived from personal data (EU Data Protection Working Party, 2017[32]) (Future of Privacy Forum, 2017[33]) (Information Commissioner’s Office, 2017[34]). In fact, this is typically how AI in education operates, usually under the supervision of human beings (OECD, 2021[35]).

The key concern about profiling is that, based on these predictions, vulnerable individuals and communities suffer differential treatment or discrimination in practices such as access to social services and benefits, educational opportunities, hiring or insurance, among others. For example, denial of opportunity for students in certain ability categories, higher termination rates for benefit eligibility based on prior receipt of scholarships, of filtering of job candidates by type of institution attended rather than academic records. These could add to other forms of unfair treatment by gender, race or other personal characteristics. Student records starting from early schooling and spanning over multiple years could be used, alone or in combination with data from other sources, to create student profiles that condition decisions and opportunities in an unfair manner at later stages of their educational trajectories and beyond. This is notwithstanding the fact that education records are routinely and legitimately used for decisions such as admissions to post-secondary education institutions or the granting of financial aid. This concern is encapsulated in the concept of “algorithmic bias” (Baker, Hawn and Lee, 2023[36]).

A related concern is the visibility and permanence of records that signal a “negative” event in the trajectory of a student, which could in turn lead to denial of opportunity. This could for instance occur if a student’s disciplinary behavioural records are used to assess the student’s suitability for a job later on. These dilemmas are similar to those posed by potential uses of health or juvenile court records. Another potential harm is that expectations about the accessibility of personal information lead to perverse incentives or reinforce risk-aversion in high-stakes situations. For example, students could be discouraged to enrol in some tertiary-level courses if they suspect that prospective employers will use curricular choices as an indicator of certain personal preferences (e.g. political views, sexual orientation) and filter job applicants on that basis. Teachers, in turn, may be less inclined to try out innovative practices of more uncertain outcomes than business-as-usual teaching if they fear that a lack of positive results may affect their job mobility or promotion prospects. These concerns are more an ethical issue related to human beings misusing information that they did not have in the past.

At the core of these problem lies the fact that having more data is no guarantee of better inferences and fairer decisions. A selective and arbitrary use of the education records available – especially as these become more granular and cover more time points and more aspects of individuals’ traits and behaviours – can lead to biased decision-making and discriminatory practices. Greater data availability opens the doors to harms if data are used inappropriately, but higher quality data and certain profiling applications can also serve to fight discrimination as well as to personalise and improve services (EU Data Protection Working Party, 2017[32]) (Future of Privacy Forum, 2017[33]).

Rather than a ban on or over-protective approach to the use of personal records because of their possible risks, the question remains how to use them according to criteria that are valued and considered fair in society. Another challenge that data protection policies must address is to create individual and societal trust in the use of data and invent ways to address concerns that people have, whether their concerns are supported by evidence or not.

All countries and jurisdictions for which we have information but the United States have a general, cross-sectoral law about privacy and data protection, which applies to education as well as all other sectors of society. All members of the European Union and some neighbouring countries (e.g. the United Kingdom and Norway) follow the EU GDPR, which must be implemented in all EU countries. The United States has a different approach: rather than a general law, it has a series of sectoral privacy and data protection laws, including one for education (FERPA). All these general laws are national. In terms of content, they are also all based on the data protection principles presented in the previous section.

About half of the countries/jurisdictions also have an additional education law (or binding rule) about privacy and data protection related to education data and settings (13 out of 28). Usually, the law clarifies countries’ access to the data they collect and handle in their student information systems (EMIS) and other administrative systems. The rules embed access restrictions rules to the systems, and also how the data can be shared, used, etc. Countries with no central student information system (or that have a small education system) tend not to have an additional regulation. Most of the guidelines issued by central authorities provide information to support schools and teacher to properly apply various country regulations about privacy and data protection.

In Europe, a few countries follow the GDPR only (e.g. Czechia, where the government collects aggregated student information only). Most countries have transposed the GDPR in their national laws, which remain general and apply to education. This is for example the case in England (Data Protection Act 2018 and UK GDPR), Finland (Data Protection Act), Iceland (2018 Act on Data Protection and the Processing of Personal Data) or Ireland (Data Protection Act 2018).

Under the general framework of the GDPR, some countries have also developed specific education rules – and usually a more restrictive approach than what the GDPR imposes. For instance, Sweden translated the GDPR into its Education Act. In France, the Education Code (Code de l’éducation) covers the data privacy of students, teachers, and staff, restricting the use of data and prescribing anonymisation rules for their information. Given France’s active digital education policy, officials within the ministry of education work in close collaboration with the National Commission for Information Technology and Liberties (Commission Nationale de l’Informatique et des Libertés - CNIL) to discuss new educational use cases.

In Austria, the Education Documentation Act (BilDokG – Bildungsdokumentationsgesetz) ensures data privacy at all educational levels for students, teachers, and staff. The law regulates data governance, from data collection stages to usage. For instance, when data is transferred from schools to statistical organisations, the information must be anonymised and the student identifier pseudonymised. Data usage beyond research or statistical purposes is forbidden. In addition, the 2021 ICT School Ordinance (IKT-Schulverordnung) regulates the use of digital devices used in schools, for example mandating the installation of a government-owned device management software in all publicly distributed digital devices. To prevent possible misuse of data from third-party digital service providers outside of Austria, digital service providers must have a contract with the government that ensures the usage of collected data is limited to pedagogical purposes.

A few European countries allow or require schools or municipalities to set their own additional privacy rules (in compliance with their national laws and the GDPR). For example, Italian schools are asked to have data privacy rules which are monitored by external data protection officers. Dutch schools appoint data protection officers to set privacy policies and raise awareness among school staff on data privacy. In Spain, while the Organic Law 3/2018 transposed the GDPR in Spain’s national law and specifically addressed data protection in education, further rules or guidelines to specific uses of technologies apply in autonomous regional governments depending on their educational context. The national government provides guidelines on data privacy and protection through its AsequraTIC website though.

Outside of Europe, countries also have laws that are sometimes inspired by or aligned with the EU GDPR. In Brazil, the Personal Data Protection General Law (LGPD) shows several parallel to the GDPR, even though the LGPD has a stricter definition of personal data. Türkiye’s Personal Data Protection Law (KVKK) was inspired by the GDPR, and Türkiye has no specific rules or guidelines about education. Finally, while Chile has had a data protection law for decades, a law project covering general data protection and privacy based on the GDPR was being discussed as of early 2024.

In Canada, two main documents ensure the general protection of data and privacy. The Freedom of Information and Protection of Privacy Act (FIPPA) guarantees individual right to access to own information, while the Personal Information Protection and Electronic Documents Act (PIPEDA) requires individual consent to collection, disclosure, and usage of personal information. Specific data protection laws on education are the responsibility of different provinces and territories. For instance, New Brunswick’s Right to Information and Protection of Privacy Act specifically addresses data protection rules and guidelines in education, such as guidelines and restriction measures to commercial vendors regarding their use of student or staff data that schools should confirm before signing procurement contracts.

In Japan, the Act on the Protection of Personal Information (APPI) governs the handling of personal information (including education data when it constitutes personal information). Until 2022, the APPI regulated only the personal information held by the private sector, while the personal information held by government bodies and incorporated administrative agencies were rules by the Act on the Protection of Personal Information Held by Administrative Organs (APPIHAO) and the Act on the Protection of Personal Information Held by Incorporated Administrative Agencies (APPIHIAA). An amendment in May 2021 has integrated those two acts into the APPI and stipulated a mandatory reporting of data leakage while placing more severe penalties for the non-compliance of PPC’s orders. Since 2023, the APPI also covers the handling of personal information held at the subgovernment level.

The United States has taken a sectoral approach to data protection. Education is one of the sectors with a data protection law: the Family Educational Rights and Privacy Act (FERPA), mentioned above. It is supplemented by the Protection of Pupil Rights Amendment (PPRA), the Children’s Internet Protection Act (CIPA), and the Children's Online Privacy Protection Rule (COPPA) that regulate different aspects of data protection and privacy. Because they concern children, the two latter laws also concern education. States have the autonomy to set up their own general or specific laws or rules about data protection and privacy above and beyond the federal ones. The federal department of education publishes guidelines for schools and other stakeholders explaining how to comply with the data protection measures. Other specific guidelines and/or laws on data collection in digital tools and protection measures are under the responsibility of US states. For example, California has a cross-cutting data protection act and specific state guidance on data protection.

In sum, all countries have well-developed privacy and data protection regulations, which typically comes with some guidelines on how to implement their data protection law, even though those guidelines may remain general and not focused on a school-level implementation.

Under countries’ data protection law, some aspects of students’ and education staff’ privacy and data are protected. An alternative to a general law could have been to have specific privacy and data protection for students and/or for education staff. Those two questions were asked in the OECD data collection (and validation meetings with countries). When it comes to the statistical/research sharing of their data to third parties, students and school staff (mainly teachers) are covered by similar regulation.

However, because some students are children, they usually also benefit from additional regulations that do not apply to school staff. Some countries have also specific, separate laws about handling the data of minors/children (e.g. the United states), which apply to children in school. About 24 (out of 29) countries/jurisdictions report rules (17) or guidelines (7) on the protection of data and privacy specifically of student data, above and beyond their general data protection and privacy regime.

This is less often the case for school staff and teachers, with 12 countries reporting specific data protection regulation and 6 to have guidelines. While the handling of data of staff and students is typically the same, they are typically in a different position because the use of their data by their employers depends on their employment contracts (and countries’ employment laws) rather than broad regimes of data protection laws. In the Nordic countries, where municipalities hire teachers, this is left to local employment contracts and collective bargaining. In the United States, for example, this would be addressed by the federal employment law and employment contracts.

As mentioned for student information systems, many countries use a tiered model for accessing the data they collect, which establishes a clear differentiation of access rights both among data custodians and among external parties based on their roles, needs and responsibilities. By clarifying who should have access to what types of data and for what purposes, a tiered model can inform the design of an information system and help it embed privacy and data security principles. Role-based access models align with principle of data minimisation recognised in modern privacy legal frameworks such as the GDPR. For instance, in Article 25, the regulation calls for controllers to hold and process only the data absolutely necessary for the completion of their duties, as well as for limiting the access to personal data to those needing to carry out the processing. Many education systems publicise the few people who have access to all data.

Within an education agency, a role-based access model means that identifiable student and teacher records within an information system will only be accessible to personnel who need that information to support their professional roles. Job descriptions should detail the rights of each type of user and specify with enough granularity the tasks requiring access to personal records. Staff may be asked to sign obligations binding them to protect data confidentiality before being granted access to sensitive data.

Role-based access models can be used to organise points of entry into an information system for different stakeholders in the education system, as well as reporting and data visualisations functionalities. For example, the work of an elementary school teacher can benefit from timely access to recent student data on attendance, grades and performance on various assessments, but not require access to detailed information on medical histories or school transfers for all students in the school. In turn, the administrator of a policy program targeting students with a specific profile, such as non-native speakers, can arguably better organise placements into such programmes when having access to past education records and identifying information on students’ family backgrounds. Meanwhile, an analyst in a public agency’s research or evaluation unit who is responsible for generating aggregated reports of student performance and submit them to higher-level authorities would need access to the performance results but not the direct identifiers for individual students. Therefore, rather than allowing each employee or external user access to all electronic student records or restricting access to needed elements one user at a time, managers of education information systems can grant access to a set of data elements based on role differentiation (NCES, 2011[7]).

Most countries use this tiered access approach for their administrative data systems such as student information systems or admission management systems. Importantly, all privacy and data protection laws presented above prevent schools or educational agencies from sharing personal data or outcomes for other purposes than school and educational operations. (One exception for data sharing includes access for research, which is elaborated below.)

Increasingly digitalised learning environments call for new solutions to ensure privacy-respectful and lawful uses of student and teacher personal data. Increasingly, technology is used to ensure this is the case as part of technical “interoperability” layers that limit the personal information that is shared with third (and notably commercial) parties.

For example, since 2020 the Netherlands has introduced a new student identifier (different from their national identifier) and data exchange layer that schools use to prevent commercial vendors to know their personal identity: the ECK iD. The Flemish Community of Belgium has a similar approach. Another approach is the Gestionnaire d’Accès aux Ressources (GAR) developed by the French Ministry of Education, which acts as a security filter ensuring that the data exchanges between schools and providers of digital learning resources comply with the proportionality and relevance principles recognised by national and EU privacy regulations (Box 8.2). The solution also establishes contractual agreements and technical and legal standards requiring, for instance, that resource providers refrain from reusing personal data for commercial purposes and enable data subjects to easily retrieve their data.

While tiered access policies and technological solutions alleviate the burden of the implementation of data protection for school staff, providing appropriate guidance to teachers, school leaders and families is another important action point. Most countries have these types of strategies, and provide at least some level of guidance. Providing guidance and making communication efforts to reach school staff and teachers is different though. Regulation and policies are not always well-known or understood. In many cases, law gets clarified because of court decisions and can remain ambiguous before then. This may lead education stakeholders to either implement either an over- or under-protective approach to privacy and data protection. One way to develop a trustworthy culture in the use of data and digital technology would be to have more institutional support on these matters. This is usually the case at the governmental level, for the use of digital administrative tools, but less often the case at the school level. Some countries expect school inspections to cover this dimension alongside all other ones.

A few countries have a specific workforce to support schools and verify the implementation of their data (and digital technology) policies. Austria includes trained staff on IT and data protection issues in its school inspections. Italy mandates that schools have an external data protection officer. The national data protection agency and other services of the ministry in France have a pro-active approach to verify how data and digital tools are used in school. Moreover, while the inspectors may or may not get specific training on this and focus on many other aspects, regular school inspections cover data protection in a few countries (Flemish Community of Belgium, Chile, Ireland, and New Zealand) (Figure 8.2 ).

While enhancing the real-time use of the data generated by countries’ digital ecosystem to inform decisions should be an objective, a strong digital education ecosystem also generates data that countries can use for research, evaluation and learning. For this purpose, the collected data should be made available to researchers under strong privacy rules.

Data have long been collected in education for statistical purposes. The digitalisation of data collected by different institutions as part of their operations has multiplied the quantity and quality of relevant data that can be used for research and improvement purposes. In particular, the emergence of new generations of longitudinal student information systems that collect records about individuals over a number of years, and sometimes from kindergarten to the workforce, have opened unprecedented research opportunities (Figlio, Karbownik and Salvanes, 2016[38]; Dynarski and Berends, 2015[39]). Increasingly, different data sources can be linked to offer a better understanding of the determinants and effects of contextual factors on educational outcomes, or, conversely, of educational trajectories on other outcomes (income, health, employment, etc.).

Administrative data refer to data collected by governments and other public entities as part of their operations (delivery or law enforcement). They typically belong to two categories (Office of Management and Budget (United States), 2016[40]):

  • Large-scale administrative data typically cover a very large share of (or the entire) population concerned by a programme. They can be cross-sectional or longitudinal. In education, longitudinal information systems following each student at the school level are a good example. Large scale administrative data can also be collected about teachers and others.

  • Programme-specific administrative data typically cover the recipients of a programme as part of the delivery of this programme. This may for example be the case for student grants and other programmes targeted to a sub-population of students, for example involved in a government programme.

Administrative data differ from survey data in that they are not primarily collected for research purposes – and often, by having a coverage that cannot be matched by a survey. While some administrative data are collected for statistical purposes, most of them are not. What digitalisation allows is to use the potential of this data for research, assuming researchers can analyse them. There are several differences between survey and administrative data: an important one is that the research survey principle of “notice and consent” is largely not appropriate for administrative data.

Countries typically use both a data-focused and governance-focused controls to make administrative data accessible to researchers. These solutions are fully complementary within a broader privacy protection strategy.

Data-focused controls consist in treating data prior to their release or sharing. They reduce privacy risks by transforming data, for example by removing or obscuring the association between the data subjects and the data elements. This involves a range of data de-identification techniques, including suppression, blurring, perturbation, randomisation or sub-sampling. These techniques target formal identifiers but also include ways to distort information and to prevent statistical linkages. Despite their limitations, the de-identification of sensitive data prior to their release remains an essential component of the privacy protection toolkit (Cavoukian and El Emam, 2011[41]). De-identified administrative datasets are indeed safely used in many countries for research and evaluation purposes in education, health and other fields. Even if records are de-identified6 prior to the release or sharing of a dataset, the risk of re-identifying7 individuals or disclosing sensitive information about them may persist as data collections proliferate within and outside educational settings and as new analytic techniques increase opportunities to mine, link and draw inferences from data.

Governance-focused solutions, in turn, seek to restrain the interactions that custodians and users have with the data, both by regulating the conditions for data access and use and by increasing their awareness and capacity to address privacy risks. Data governance can help protect privacy by establishing effective controls and procedures in at least four areas: physical and IT data security; tiered access models; supervised access and licensing solutions; and privacy awareness, training and communication. A good governance model implements actions in all four areas and operates in combination with data-focused solutions.

Most countries share at least some of their administrative data under usual statistical data sharing laws – and often use different data-governance techniques that make it very difficult to re-identify individuals. As noted above, most data protection and privacy laws have a “research exception” that allows for the sharing of de-identified data collected by public agencies with researchers – under certain conditions.

Countries report different modes of access for researchers though.

21 (out of 29) give access to (at least some of) their administrative under the same conditions (rather than ad hoc application processes) to all researchers. In some cases, these conditions can be restrictive: for example, in Washington state (United States), researchers can only access data if data have not already been provided to other researchers to explore similar research questions. In many cases, the difficulties to share are due to a lack of human resources (and budget). While most countries give a similar access to public and private researchers, a few countries only give access to public researchers (Chile, French Community of Belgium and Türkiye).

The documentation of administrative datasets is a key aspect to equal access to education researchers is to diverse administrative datasets. Documenting a dataset mainly lies in having a public dictionary of the data available in a dataset. When not documented, given that administrative systems are typically not open to the public, even if they can apply for access to the data, researchers will typically not know what research questions they could answer by using the dataset, unless they know people who maintain or use the systems. While the documentation of datasets is a burdensome exercise that requires human resources and budget, it is key for the use of administrative data by researchers. Slightly less than half of the countries for which we have information (13 out of 29) document all or most of their administrative education datasets. In countries that do not document their administrative datasets, there cannot truly be a fair and equal research access to data.

In the past, most data collected within schools were only accessible to schools and government agencies. Because they now use a variety of digital tools and resources that are proprietary, some companies have access to student data. Access and use of those data is regulated by privacy and data protection laws, usually with special requirements for the data of minors. One common prohibition is for vendors to use these data for marketing and commercial purposes.

However, the use of specific digital tools and resources in schools (or for school education), for example adaptive learning tools, also generates data that commercial vendors can use. Increasingly, technical solutions are put in place that prevent or limit the possibilities of private companies to know the identity of their users. However, the use of those data allow them to improve their algorithms and services. These data are typically the property of vendors.

One question for countries is whether some of the data collected by commercial vendors within (public) schools should be available either for research or for the development of new education products. For example, should the use of adaptive learning systems in public schools (hypothetically) have the potential to allow for a breakthrough in the understanding of how to support the learning of students, for example by making them follow certain learning sequences, it might be beneficial for education systems to make some of the data collected by commercial vendors accessible to researchers and even other companies. In many sectors, the sharing of data and the establishment of “data spaces” made of data collected by companies and reused by other organisations is an aspiration. This is for example what the EU Data Governance Act that was passed in 2022 and became applicable since September 2023 attempts to promote, with measures trying to increase trust in data sharing, for example by allowing data intermediaries and facilitating the reuse of public data.8

In Austria, in principle no data can leave schools, so that neither the commercial providers of the digital tools/resources nor anyone else should be able to access and reuse data that were collected in schools, including process data. It is the only country that reported a rule on accessing or sharing data collected by commercial vendors.

The interviews with government officials highlighted the lack of mechanisms and awareness about the possible benefits of reusing some of the data collected by business providers – a measure that would have to balance incentives for commercial developers and societal benefits.

To conclude, a key aspect of countries’ data governance should be a consideration about giving a wide access to their administrative data to researchers, with respect to privacy regulation, and under equitable conditions. Fairness in access to all researchers requires the public documentation of their datasets. Governments should also consider to what extent they should develop incentives for commercial vendors to share some of the data they collect in the context of public education – not so much the data about students, but the use and process data that could help researchers or other vendors to improve teaching and learning.

Figure 8.3 and provide a summary of the main access policies within countries as of early 2024.

While data governance mainly focuses on the handling and sharing of data, new concerns have emerged about the governance of technology itself, and notably algorithms that support automated decision-making (and AI algorithms). OECD countries and Brazil do not have many automated decision-making systems in their education systems in terms of management tools, if any. There are a few rule-based algorithms used for making some decisions or allocations, but countries usually note that the algorithms only inform rather than make the decision. Figure 8.4 presents the extent to which rule-based algorithms are used by different categories of systems provided by governments (acknowledging that no AI-based systems were available yet). Regarding AI algorithms, the question was understood as the use of algorithms that allow to detect, diagnose or act on different educational aspects. Sometimes, the use of those advanced rule-based algorithms is just about casting light on some aspects of the data (detecting issues through dashboards) or to make sure that credentials are verifiable.

There may be more sophisticated systems in schools and classrooms, for example adaptive learning systems (including intelligent tutoring systems) and, here and there, some of the advanced uses of AI in the classroom presented in the Digital Education Outlook 2021 (Baker, 2021[42]; OECD, 2021[35]; D’Mello, 2021[43]; Dillenbourg, 2021[44]) which predate the public emergence and use of generative AI (OECD, 2023[45]).

While AI in education presents many opportunities, there are also several possible risks.

One challenge is that the technology may not be as effective as one would wish: the models ay not perform very well what they are designed to do. In many cases, this does not matter as it may still be helpful and do no harm; it may also still perform better than human beings. For some types of decisions, for example those that are high stakes for individuals, tolerating error is not acceptable though (even though this should be compared to human error for a similar task). One possible consideration for guidance or regulation would thus be around the effectiveness of technology solutions. Countries could consider asking education technology developers to disclose the level of effectiveness of their tools – verify it themselves, or get it verified by accredited third parties. Whether this is worth doing depends on the risk posed by error.

A second challenge is that technology may increase inequity or fairness, two issues that are antithetic with educational objectives. The worsening of inequity may come from the fact that some algorithms, for example those used for adaptive learning work better for some students than others – and contribute to increasing the gap between those groups rather than decreasing it. In some cases, this may come from algorithmic bias, which occurs when an algorithm encodes (typically unintentionally) the biases present in society, producing predictions or inferences that do either not perform the same way for all groups of a population or are clearly discriminatory towards specific groups (Baker, Hawn and Lee, 2023[36]). Countries should thus consider measures to identify whether these types of biases are present when they use digital technology and address them.

Other challenges are ethical and relate to human rights and dignity and to democratic values. In 2019, the OECD has adopted a Recommendation on Artificial Intelligence that promote “use of AI that is innovative and trustworthy and that respects human rights and democratic values” and provide principles on how governments and other actors can “shape a human-centric approach to trustworthy AI” (OECD, 2019[46])9. UNESCO has adopted a global Recommendation on the Ethics of Artificial Intelligence in 2021 (UNESCO, 2021[47]).

This section presents some guidelines and rules that are emerging across countries. Regulation has the advantage of being enforced but could stifle innovation and prevent technological innovation where the regulation applies. Guidelines have the advantage to give directions and shape expected behaviours: they allow for a societal dialogue and to explore different uses of technology, but they have the disadvantage of being non-binding and thus leave some serious issues to ethics.

As of 2024, France is the only country that already implements some binding rules in this area. These rules apply to algorithms provided by public agencies and schools. France’s digital law requires that algorithms (used by public agencies) be explainable to lay people and open source. Within schools, part of the restrictions come from the ministry of education. For example, AI cannot be used for behavioural studies, but only for pedagogical purposes. AI systems about engagement based on gamification can be used, but systems with eye-tracking functionalities are for example only allowed for research purposes (see (D’Mello, 2021[43]) for a review of these AI systems). Overall, high-stakes highly automated systems are forbidden.

As part of its digital strategy, the European Union plans to regulate artificial intelligence (AI) to facilitate its development and use: the Artificial Intelligence Act should be passed early 2024. The Act proposal adopts a risk-based approach and requires a series of disclosures and assessments for AI developers. Similar to medical devices, it proposes to have authorisations for AI software to be put on the market. To support innovation, it will also provide temporary exemptions for the testing of AI (Box 8.3). These “regulatory sandboxes” appear as a promising way to develop AI systems (OECD, 2023[48]). Should the Act be adopted as planned, a careful definition of the forbidden systems will need to be given. The proposed risk-management approach seems to apply by sectors in the negotiating position – with education included as a “high-risk” area.

The United States has also released a blueprint for legislation about AI systems that is meant to inform government regulation as well as organisational practices. It focuses on the effectiveness of algorithms, that is, making sure that they do what they are supposed to do; the prevention of bias against some groups; people’s agency on their data handling; disclosure that an algorithm is being used; and the option of human support when a problem arises (Box 8.4).

Korea has issues “ethical principles” on the use of AI that play the same role (Box 8.5). And New Zealand has developed a charter for its government agencies that also follow similar standards on a voluntary basis: explain algorithms, disclose how data are secured, engage with stakeholders, identify and manage possible bias and keep a “human in the loop” (Box 8.6).

This chapter highlights that countries tend to have robust privacy and data protection policies. All have privacy and data protection regulations in place, and in many cases education-specific ones. A few countries make schools responsible for the implementation of their data protection, and while most do not proactively verify the implementation of the law, they limit the possibilities of privacy breaches in many different ways. Most take a tiered-access policies to their digital tools, thus limiting access to personal data. Some make it mandatory for schools to have a data protection officer (Italy) or make data protection a dimension of their school inspections (Ireland). Increasingly technology itself is a way to protect privacy, with the use of technology layers that manage students’ and teachers’ identity and prevent third parties to identify people using their digital platforms.

Most countries also de-identify and make (some of their) administrative datasets available for research. While there is a risk of re-identification, there is unprecedented value in analysing these data for system improvement and innovation. While most countries provide some level of access to their datasets, they should also document them to allow for a wider use by their research community.

Education systems need to find privacy-protective models of education data use that can support educational research and improvements in teaching and learning in order to raise performance and reduce achievement gaps. In the status quo, the use of administrative data for system-level monitoring and evaluation poses little problems with regard to privacy or data security. However, the innovation frontier calls for highly granular and timely information to inform education policy and practice, which often implies a greater reliance on individual-level, personal data and greater privacy risks.

The challenge for education agencies is to balance privacy and data security requirements with opportunities for improving research and for shaping teaching and learning practices. With the appropriate safeguards, education stakeholders should be able to access and use administrative data for legitimate purposes and in a timely manner. The benefits of such uses of data, currently overshadowed by privacy risks, should also become more visible.

Education systems can adopt a risk-management approach to address the policy tensions between privacy and other important objectives. A risk-management approach recognises a diversity of uses of personal education records, their potential benefits, and their associated privacy risks, and serves to reconcile legitimate privacy concerns of students, families and educators with the use of education data to improve educational outcomes. Governments across the OECD are increasingly adopting risk-management approaches in the context of digital security (OECD, 2016[17]), as called for in the OECD Revised Guidelines on the Protection of Privacy and Transborder Flows of Personal Data (OECD, 2013[2]) and the OECD Digital Security Risk Recommendation for Economic and Social Prosperity (OECD, 2015[52]).

A first step in this direction is to break away with the expectation of fully eliminating risk in the use of education data. Unless one entirely disregards the analytical value of data, scenarios with zero privacy risk are unrealistic. Reducing the granularity of information to protect confidentiality most often implies diminished accuracy and utility of data. Managing risk means accepting that there will be a residual privacy risk in any useful data release, and also evaluating and adopting the most suitable privacy protection measures in light of the intended data uses and potential threats. Another requirement is to shift the focus from privacy controls at the stages of data collection and transformation, to controls on data access, sharing and use (Elliot et al., 2016[53]; Altman et al., 2015[54]).

A wide range of strategies and tools exist to support the implementation of a privacy risks management approach. Many of these privacy protection strategies are already applied in other sectors where public agencies engage in extensive data sharing for research and evaluation purposes, most notably in health (OECD, 2013[55]; OECD, 2015[56]). The portfolio includes data-focused solutions (i.e. treating data prior to their release or sharing) as well as governance solutions (i.e. controls on data access and data use). Effective privacy protection requires applying both types of strategies in conjunction.

While more recent, the governance of algorithms underlying digital education tools and resources is increasingly important as education systems get mode digitised. Here again, a risk-management approach should be adopted. The main risks with algorithms lie in algorithmic bias that perform or produce different outcomes for different population groups. Baker (2023[36]) highlights the existing research on algorithmic bias and highlights the need for countries to balance privacy and data protection against possible algorithmic bias, acknowledging that bias can only be identified and addressed if personal (and sometimes even sensitive) data are collected.

A second risk lies in the effectiveness of smart technologies, notably when making automated decisions or making high stake recommendations to humans. While algorithms can have the advantage of systematically applying some rules, human beings should still oversee them – and effectiveness should be demonstrated when decisions are high stakes for individuals.

A third risk relates to social acceptance and public trust. Explaining and publicising how smart technology works, how it handles data, engaging with education stakeholders to explain how their work are examples of approaches to addressing this issue. Being transparent about how algorithms work (explainability) and about the values and criteria it uses in its implementation is getting increasingly important.

Those are the pillars of most existing guidelines and principles in this area, including the OECD Opportunities, guidelines and guardrails about the effective and equitable use of AI in education (see (OECD, 2023[1]).

In which cases technology per se would require regulation and along what lines is one of the areas that international discussions can inform. Countries have other possible policies to address these issues and ensure appropriate digital tools are available to their education stakeholders. For example, countries could embed in their procurement procedures some requirements on the performance or fairness of digital tools.

Privacy and data protection as well as policies regarding the use of digital tools and resources represent mounting responsibilities for school leaders and school staff. Beyond regulation, most countries should provide schools with guidelines explaining the law, and provide use cases and examples. Most countries take a reactive approach and wait to respond to complaints and privacy breaches. Providing a mix of proactive verifications, not necessarily to sanction but to develop people’s capacity in the field, would be a way to create more trust in how school handle data, and less fear about using digital tools and resources.

While privacy and data protection is an imperative, it should also be a condition for and an enabler to a trustworthy digital transformation of education.

References

[4] Abrams, M. (2014), The Origins of Personal Data and its Implications for Governance, The Information Accountability Foundation, http://informationaccountability.org/wp-content/uploads/Data-Origins-Abrams.pdf (accessed on 12 April 2018).

[54] Altman, M.; A. Wood; D. O'Brien; S. Vadhan; U. Gasser (2015), “Towards a Modern Approach to Privacy-Aware Government Data Releases”, Berkeley Technology Law Journal, Vol. 30/3, pp. 1967-2072, https://doi.org/10.15779/Z38FG17.

[42] Baker, R. (2021), “Artificial intelligence in education: Bringing it all together”, in OECD Digital Education Outlook 2021: Pushing the Frontiers with Artificial Intelligence, Blockchain and Robots, OECD Publishing, Paris, https://doi.org/10.1787/f54ea644-en.

[36] Baker, R., A. Hawn and S. Lee (2023), “The state of the situation and policy recommendations for algorithmic bias”, in Digital Education Outlook 2023, OECD Publishing.

[25] Boninger, F. and A. Molnar (2016), Learning to be Watched: Surveillance Culture at School, National Education Policy Center, Boulder, CO, http://nepc.colorado.edu/publication/schoolhouse-commercialism-2015.

[5] Buckley, J.; L. Colosimo; R. Kantar; M. McCall and E. Snow (2021), “Game-based assessment for education”, in OECD Digital Education Outlook 2021: Pushing the Frontiers with Artificial Intelligence, Blockchain and Robots, OECD Publishing, Paris, https://doi.org/10.1787/9289cbfd-en.

[30] Burns, T. and F. Gottschalk (eds.) (2019), Child protection online, Educating 21st Century Children: Emotional Well-being in the Digital Age, OECD Publishing, Paris, https://doi.org/10.1787/b7f33425-en.

[57] Calo, R. (2011), “The Boundaries of Privacy Harm”, Indiana Law Journal, Vol. 86/3, https://www.repository.law.indiana.edu/ilj/vol86/iss3/8/.

[41] Cavoukian, A. and K. El Emam (2011), Dispelling the Myths Surrounding De-identification: Anonymization Remains a Strong Tool for Protecting Privacy, http://www.ipc.on.ca/images/Resources/anonymization.pdf.

[37] Commission Nationale de l’Informatique et des Libertés (2017), Rapport d’activité 2017, https://www.cnil.fr/sites/default/files/atoms/files/cnil-38e_rapport_annuel_2017.pdf (accessed on 7 November 2019).

[43] D’Mello, S. (2021), “Improving student engagement in and with digital learning technologies”, in OECD Digital Education Outlook 2021: Pushing the Frontiers with Artificial Intelligence, Blockchain and Robots, OECD Publishing, Paris, https://doi.org/10.1787/8a451974-en.

[29] Data Quality Campaign (2017), Education Data Legislation Review The Role of State Legislation, Data Quality Campaign, Washington, DC, https://2pido73em67o3eytaq1cp8au-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/DQC-Legislative-summary-0926017.pdf (accessed on 9 January 2018).

[13] de Montjoye, Y.; C.A. Hidalgo; M. Verleysen and V. Blondel (2013), “Unique in the Crowd: The privacy bounds of human mobility”, Scientific Reports, Vol. 3/1, p. 1376, https://doi.org/10.1038/srep01376.

[12] de Montjoye, Y.; L. Radaelli; V.K. Singh and A.S. Pentland (2015), “Unique in the shopping mall: on the reidentifiability of credit card metadata.”, Science, Vol. 347/6221, pp. 536-9, https://doi.org/10.1126/science.1256297.

[44] Dillenbourg, P. (2021), “Classroom analytics: Zooming out from a pupil to a classroom”, in OECD Digital Education Outlook 2021: Pushing the Frontiers with Artificial Intelligence, Blockchain and Robots, OECD Publishing, Paris, https://doi.org/10.1787/336f4ebf-en.

[11] Duhigg, C. (2012), “How Companies Learn Your Secrets - The New York Times”, The New York Times, http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html (accessed on 5 February 2018).

[15] Dwork, C.; A. Smith; T. Steinke and J. Ullman (2017), “Exposed! A Survey of Attacks on Private Data”, Annual Review of Statistics and Its Application, Vol. 4/1, pp. 61-84, https://doi.org/10.1146/annurev-statistics-060116-054123.

[39] Dynarski, S. and M. Berends (2015), “Introduction to Special Issue: Research Using Longitudinal Student Data Systems: Findings, Lessons, and Prospects”, Educational Evaluation and Policy Analysis, Vol. 37/1, https://doi.org/10.3102/0162373715575722.

[53] Elliot, M.; E. Mackey; K. O'Hara and C. Tudor (2016), The anonymisation decision-making framework, UK Anonymisation Network, Manchester, http://ukanon.net/wp-content/uploads/2015/05/The-Anonymisation-Decision-making-Framework.pdf (accessed on 4 January 2018).

[32] EU Data Protection Working Party (2017), Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679, European Commission, http://ec.europa.eu/justice/data-protection/index_en.htm.

[21] European Commission (2019), Europeans’ attitudes towards cyber security; Special Eurobarometer 499, European Union, October.

[49] European Parliament (2021), Artificial Intelligence Act, https://www.europarl.europa.eu/doceo/document/TA-9-2023-0236_EN.html (accessed on 26 August 2023).

[16] Executive Office of the President (2014), Big Data and Privacy: A Technological Perspective, President’s Council of Advisors on Science and Technology, https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf (accessed on 25 June 2018).

[38] Figlio, D., K. Karbownik and K. Salvanes (2016), “Education Research and Administrative Data”, in Handbook of the Economics of Education, Elsevier, https://doi.org/10.1016/B978-0-444-63459-7.00002-6.

[33] Future of Privacy Forum (2017), Unfairness by Algorithm: Distilling the Harms of Automated Decision-Making, Future of Privacy Forum, Washington. DC.

[6] García, D.; M. Goel; A. Agrawal and P. Kumaraguru (2018), “Collective aspects of privacy in the Twitter social network”, EPJ Data Science, Vol. 7/1, p. 3, https://doi.org/10.1140/epjds/s13688-018-0130-3.

[58] Golle, P. (2006), Revisiting the Uniqueness of Simple Demographics, https://crypto.stanford.edu/~pgolle/papers/census.pdf.

[8] Groves, R. and B. Harris-Kojetin (eds.) (2017), Federal Statistics, Multiple Data Sources, and Privacy Protection, National Academies Press, Washington, D.C., https://doi.org/10.17226/24893.

[34] Information Commissioner’s Office (2017), Feedback request – profiling and automated decision-making.

[19] Information is Beautiful (2019), World’s Biggest Data Breaches and Hacks, https://informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/ (accessed on 23 October 2019).

[20] Jardine, E. (2015), Global Cyberspace Is Safer than You Think: Real Trends in Cybercrime, Global Commission on Internet Governance, https://www.cigionline.org/sites/default/files/no16_web_0.pdf (accessed on 5 February 2018).

[10] Narayanan, A. and V. Shmatikov (2008), Robust De-anonymization of Large Sparse Datasets, IEEE Computer Society, Washington, DC, https://doi.org/10.1109/SP.2008.33.

[7] NCES (2011), Data Stewardship: Managing Personally Identifiable Information in Electronic Student Education Records, National Centre for Education Statistics (NCES).

[1] OECD (2023), OECD Digital Education Outlook 2023: Towards an Effective Digital Education Ecosystem, OECD Publishing, Paris, https://doi.org/10.1787/c74f03de-en.

[45] OECD (2023), OECD Digital Education Outlook 2023: Towards an Effective Digital Education Ecosystem, OECD Publishing, Paris, https://doi.org/10.1787/c74f03de-en.

[48] OECD (2023), “Regulatory sandboxes in artificial intelligence”, OECD Digital Economy Papers, No. 356, OECD Publishing, Paris, https://doi.org/10.1787/8f80a0e6-en.

[35] OECD (2021), OECD Digital Education Outlook 2021: Pushing the Frontiers with Artificial Intelligence, Blockchain and Robots, OECD Publishing, Paris, https://doi.org/10.1787/589b283f-en.

[46] OECD (2019), “OECD AI Principles overview”, OECD.AI Policy Observatory, https://oecd.ai/en/ai-principles.

[17] OECD (2016), “Managing Digital Security and Privacy Risk”, OECD Digital Economy Papers, No. 254, OECD Publishing, Paris, https://doi.org/10.1787/5jlwt49ccklt-en.

[52] OECD (2015), Digital Security Risk Management for Economic and Social Prosperity: OECD Recommendation and Companion Document, OECD Publishing, Paris, https://doi.org/10.1787/9789264245471-en.

[56] OECD (2015), Health Data Governance: Privacy, Monitoring and Research, OECD Health Policy Studies, OECD Publishing, Paris, https://doi.org/10.1787/9789264244566-en.

[2] OECD (2013), OECD Revised Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, http://www.oecd.org/sti/ieconomy/oecd_privacy_framework.pdf (accessed on 12 January 2018).

[55] OECD (2013), Strengthening Health Information Infrastructure for Health Care Quality Governance: Good Practices, New Opportunities and Data Privacy Protection Challenges, OECD Publishing, Paris, https://doi.org/10.1787/9789264193505-en.

[3] OECD (2013), The OECD Privacy Framework, https://www.oecd.org/sti/ieconomy/oecd_privacy_framework.pdf.

[50] Office of Education Technology (OET) (2023), AI and the Future of Teaching and Learning, https://tech.ed.gov/files/2023/05/ai-future-of-teaching-and-learning-report.pdf.

[40] Office of Management and Budget (United States) (2016), Commission on Evidence based Policymaking, https://obamawhitehouse.archives.gov/omb/management/commission_evidence (accessed on 24 November 2023).

[18] Office of the Privacy Commissioner of Canada (2015), Privacy Act Annual Report to Parliament 2014-15, Office of the Privacy Commissioner of Canada, https://www.priv.gc.ca/en/opc-actions-and-decisions/ar_index/201415/201415_pa/#heading-0-0-2 (accessed on 13 April 2018).

[22] Pew Research Center (2023), Growing public concern about the role of artificial intelligence in daily life, https://www.pewresearch.org/short-reads/2023/08/28/growing-public-concern-about-the-role-of-artificial-intelligence-in-daily-life/ (accessed on 29 August 2023).

[23] Pew Research Center (2019), Americans and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information, https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/ (accessed on 29 August 2023).

[24] Polonetski, J. and J. Jerome (2014), Student Data: Trust, Transparency, and the Role of Consent, Future or Privacy Forum.

[14] Rocher, L., J. Hendrickx and Y. de Montjoye (2019), “Estimating the success of re-identifications in incomplete datasets using generative models”, Nature Communications, Vol. 10/1, https://doi.org/10.1038/s41467-019-10933-3.

[26] Singer, N. (2017), “How Google Took Over the Classroom”, New York Times, https://www.nytimes.com/2017/05/13/technology/google-education-chromebooks-schools.html.

[28] Singer, N. (2014), “With Tech Taking Over in Schools, Worries Rise”, The New York Times, https://www.nytimes.com/2014/09/15/technology/with-tech-taking-over-in-schools-worries-rise.html (accessed on 25 January 2018).

[31] Solove, D. (2006), “A Taxonomy of Privacy”, University of Pennsylvania Law Review, Vol. 154/3, pp. 477-560, https://www.law.upenn.edu/journals/lawreview/articles/volume154/issue3/Solove154U.Pa.L.Rev.477(2006).pdf (accessed on 15 March 2018).

[9] Sweeney, L. (1997), “Weaving Technology and Policy Together to Maintain Confidentiality”, The Journal of Law, Medicine & Ethics, Vol. 25/2-3, pp. 98-110, https://doi.org/10.1111/j.1748-720X.1997.tb01885.x.

[47] UNESCO (2021), Recommendation on the Ethics of Artificial Intelligence, https://unesdoc.unesco.org/ark:/48223/pf0000380455.

[51] White House Office of Science and Technology Policy (WHOSTP) (2022), Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People, White House, https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf.

[27] Zeide, E. and H. Nissenbaum (2018), “Learner Privacy in MOOCs and Virtual Education”, Theory and Research in Education, Vol. 16/3, pp. 280-307, https://doi.org/10.1177/1477878518815340.

Notes

← 1. A follow-up study suggests that more than 50% of the population in the United States can be uniquely identified using these three pieces of information (Golle, 2006[58]).

← 2. See blogpost by Melanie Lazare (Feb 2021): https://blog.google/outreach-initiatives/education/classroom-roadmap/#:~:text=Over%20the%20last%20year%2C%20the,from%2040%20million%20last%20year. And https://www.digitaltrends.com/web/google-g-suite-70-million/.

← 3. https://edu.google.com/k-12-solutions/privacy-security [Accessed 23/08/2018]

← 4. Moreover, large technology companies tend to operate worldwide while data protection regimes vary across countries. The GDPR establishes that when personal data are transferred from the EU to controllers, processors or other recipients in third countries or to international organisations, such transfers to and processing in third countries and international organisations may only be carried out in full compliance with the GDPR.

← 5. Privacy breaches (or violations) are different from privacy harms. Not all breaches cause harm, and harm may occur in the absence of a privacy breach. Calo (2011[57]) distinguishes between subjective and objective privacy harms. Subjective harms are unwelcome mental states such as embarrassment or fear that follow from unwanted observation. Objective harms are unanticipated or coerced use of information concerning a person against that person, for example identity theft.

← 6. There are multiple strategies to turn identifiable data into non-identifiable data. ‘De-identification’ refers to the removal or obscuring of personal identifiers, either direct or indirect. ‘Anonymization’ is a broader concept that encompasses other statistical disclosure limitation techniques and regulations for data access and use. See Glossary for details.

← 7. Re-identification refers to discovering the identity of an individual in a dataset where this information was not initially disclosed. If this discovery comes about against the will of the data subject, then his or her privacy is not preserved.

← 8. https://digital-strategy.ec.europa.eu/en/policies/data-governance-act.

← 9. https://oecd.ai/en/ai-principles

Legal and rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.