Annex D. Leveraging big data to assess the diffusion of digital skill demands in labour markets

When using job postings to examine the diffusion of digital skills and technologies across labour markets, several previous studies have focused on counting the increase in the frequency with which the terms related to digital technologies have been mentioned across job postings.

Metrics based on the simple count of the frequency of digital skill mentions are, however, likely to miss whether such increase has been concentrated in a small number of sectors/occupations or if, instead, digital technologies and skill demands have actually spread across a wide variety of sectors and occupations, truly permeating labour markets. This latter question is arguably very important, as widespread diffusion of digital technologies across sectors and different job roles is what drive significant changes in the overall labour market that policy makers and firms need to adjust to.

In order to accurately capture the growth in the diffusion of skill demands in the digital economy across different sectors and occupations of the labour market, this chapter uses machine learning techniques applied to the analysis of online job postings to examine how much digital skills are interconnected with other skills across job vacancies and in employers’ recruitment requirements.

A vector representation of skill keywords in a n-dimensional space is functional to assess the connections across skills and, as such, the degree by which skills are pervasive in the observed labour market published online. The connections between a group of keywords can be represented by a so-called skill graph. In such graph, the keywords extracted from online vacancies represent the vertices (also called nodes) which can be either connected when both vertices co-occur in a specific job vacancy or disconnected when both vertices never co-occur in the same vacancy.

A so-called adjacency matrix can be built to represent these skill co-occurrences.1 Whenever a skill co-occurs with another skill in a certain job vacancy, the row corresponding to the skill “A”, and the column corresponding to the skill “B” will get the value 1. Note that the adjacency matrix is symmetric, meaning that the co-occurrence between skills is undirected and therefore commutative.

One can hence use this adjacency matrix to calculate the eigenvector centrality (EVC) and the local clustering coefficient (LCC) for each skill. The power iteration algorithm is used to derive the relativity score for each vertex v in the network. Given a graph G, and adjacency matrix A, the relative centrality score of a certain skill can be defined as:

EVCv=1λtM(v)EVCt=1λtGav,tEVCλ

Since this is an undirected graph, the local clustering coefficient can also be defined as:

LCCi=ejk:vj,vkNi, ejkEki(ki-1)

Both measures serve as an important indicator for contextual diversity and the importance of certain skills as compared to other skills in the network. In graph theory, the “eigenvector centrality” and the “local clustering coefficient” are two measures that are commonly used to assess the influence of a node in a network or, in other words, to measure the degree and quality of connections of a keyword with the rest of words in the text under exam. Originally, these measures were developed by researchers in Google and used in the PageRank algorithm to quantify the importance of the connections among web pages based on the textual information contained in it. The same measures can, however, be used to capture the number of connections that a skill keyword has with other skills as well as the ‘quality’ of those connections, where higher quality connections are those with other skills that are also highly connected to the rest of the skills in the vector space.

One can finally create a unidimensional measure of skill diffusion by normalizing and rescaling the eigenvector centrality and the local clustering coefficient into a single measure using the following:

Diffusionit=EVCit+(1-LCCit)2,

The change over time of the Diffusion index is used in the analysis above to measure the degree by which skills have become pervasive in the labour market. The Diffusion index is computed for each skill keyword analysed in the database of online job postings and compared to the average diffusion across all skills in each economy where faster diffusion of a skill means an increase (above average) of the connections of that particular skill with other skill demands across job postings, hence an increase in how much that skill is permeating the labour market in a variety of different work contexts and job roles.

Note

← 1. The extracted skill graph forms an undirected acyclic graph, meaning that skills do not co-occur with themselves. As a result, the diagonal of the adjacency matrix is 0.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.