5.2. Science and digitalisation
Advances in scientific knowledge are key to developing new digital technologies. Over the last decade, China almost trebled its contribution to computer science journals, overtaking the United States in the production of scientific documents in this field. However, the share of documents that are in the world’s top-cited (top 10% normalised by type of document and field) is still close to 7%, less than the world average and well below the United States at 17%. The rate of computer science publications from China which are highly cited has nonetheless more than doubled since 2006, making China the second-largest producer worldwide. In some countries, such as Italy, Israel, Luxembourg and Poland, the production of scientific research in the field of computer science carries a much higher relative citation rate compared to overall scientific production within those countries. Nearly 20% of computer science publications by Switzerland-based authors feature among the world top-10% cited scientific documents. This figure reaches 25% for Luxembourg although with a much smaller level of scientific production.
Scientific activity makes intensive use of digital tools and generates digital assets in the form of new data and software. A new 2018 OECD pilot survey, the International Survey of Scientific Authors (ISSA), focuses on measuring the digitalisation of science. Preliminary findings show that, on average, 60% or more of scientific publications generate new data and new software codes. Countries with higher levels of R&D intensity are, on average, also more likely to report high shares of scientific production that generate new computer code, either alone or in combination with new data. More than 45% of survey respondents resident in Korea reported developing new code, mostly in combination with data, compared to 20% in Mexico. Data generation is more widespread and evenly distributed. In computer science and decision sciences, more than 50% of respondents generate code, closely followed by physics and astronomy. Code generation is least common in the arts and humanities, and in chemistry, at less than 10% of respondents.
Scientific research represents an important foundation for technological advancement and innovation. By identifying non-patent literature, in particular scientific articles, cited in patent documents, it is possible to gain insights into linkages between scientific progress and new inventions. Digital technologies build mostly on digital-related science, with electrical or information engineering articles cited in 37% of digital patents and computer and information sciences articles cited in 20%. However, digital technologies can be applied in a wide range of fields and therefore, digital patented technologies also draw on scientific production from a broad variety of other areas, especially the physical sciences (12%) and various medical domains, in addition to art, languages and others.
The United States accounted for around 70% more top-cited scientific publications on computer science than China in 2016. This gap has shrunk from nearly 500% in 2006.
Definitions
Computer science publications consist of citeable documents (articles, conference proceedings and reviews) featured in journals specialising in this field. “Top-cited publications” are the 10% most-cited papers normalised by scientific field and type of document (OECD and SCImago Research Group, 2016).
Research data include numerical scores, textual records, images and sounds that can be used as primary sources for scientific research. Code includes custom-developed software and code, laboratory notebooks and other computer-enabled documents describing every step of the research work and protocols followed.
Digital (ICT) patent families are identified using the list of IPC codes in Inaba and Squicciarini (2017).
Measurability
Identifying the digital-related content of research outputs is a major challenge. Bibliographic indices provide a readily available source of data for illustrative purposes, though with interpretability and coverage limitations. Using publishers’ journal classifications would lead to understatement of the digital intensity of science due to the pervasiveness of digital research. Alternatives are scanning publications for content or directly contacting authors. The OECD ISSA 2018 survey does the latter approach in order to gather insights on the use of digital tools and the contribution of science to the digitalisation process (see page 5.6). It should be noted, however, that not all so-called “data scientists” publish in scholarly journals, which form the basis for identifying and contacting authors.
Published patent documents contain references to prior art on which inventions rely, including previous patents and non-patent literature (NPL). Analysing the link between patents and scientific literature cited in patent documents helps to uncover the links between science and innovation. The Max Planck Digital Library has developed robust methods to link NPL with scientific reference data (see Knaus and Palzenberger, 2018). This analysis is based on data elaborated by the Max Planck Institute for Innovation and Competition using information provided in the Clarivate Web of Science (see Poege et al., 2018).