copy the linklink copied!Annex D. Self-contained labour areas (SLA-ZTA) – Python package

The Python package SLA-ZTA (for self-contained labour areas – zones de travail autonomes) was created to offer an open-source solution to a computational code originally coded in SAS®, and in so doing increase usability, adaptability and transferability of this methodology. Currently, the SLA-ZTA system code is released and maintained as part of a PyPI repository that can be accessed at https://pypi.org/project/SLAZTA/.

The methodology embedded in the SLA-ZTA code reflects a multidirectional-based approach to the delineation of functional areas. The computational core draws largely from the analysis of travel to work (TTW) areas of Britain (Coombes and Bond, 2008[1]), and the code was used to delineate functional areas outside major metropolitan areas in Canada. More details on the data source, geographic unit of analysis and results for Canada were discussed in Chapter 5.

The SLA-ZTA Python code is organised into six core modules, providing adaptability and the possibility to finetune the specifications as required for various applications. The approach underpinning this model is particularly suitable for the delineation of functional areas in what would be generally considered rural or non-metro areas as the clustering procedure puts an emphasis on the strength of commuting flows and, in the existing specification of the model, does not impose a minimum population size for each cluster.

The computational workflow from input matrices to the results generated by the model can be summarised as follows:

  1. 1. To create the SLA clusters, the commuting data (Non-symmetric Matrix Flow) is initially reconfigured into a set of data matrices that, for each area, expresses a measure of “success” of self-containment of the labour force, which will be further explained below, and the relationships between that area and every other area with which it shares a commuting relationship (Symmetric Strength Matrix). After this reconfiguration of the databases, the code completes module one to check if all areas have achieved success. If not, the workflow moves to the second module and continues.

  2. 2. The code then determines which area is currently farthest away from achieving success. As this method is aimed at discovering non-metropolitan areas, no minimum population size is required to achieve success. Success is defined as the area reaching the level of self-containment desired for its population size and number of employed workers, with that being a sliding scale between 75% self-containment for larger areas and 90% self-containment for smaller areas.

  3. 3. For the least successful area, the code then determines the other area with which it shares the strongest reciprocal connection. This is done using the following equation:

F a , b / R a   *   ( F a , b ) / W b   +   ( F b , a ) / R b   *   ( F b , a ) / W a

where Fa,b is the number of journeys to work from area A to area B; Ra is the number of workers who live in area A; and Wa is the number of people who work in area A.

  1. 4. The least successful area is joined to the area with which it has the strongest connection, creating a new area.

  2. 5. The success metrics (Success Table) and commuting relationships (Symmetric Strength Matrix) are recalculated to include connections between the new area and the remaining areas. Information on previously existing areas (e.g. previous configurations of the data) is discarded at this step in the programme. This ensures that the amount of memory used decreases rather than increases as the programme runs, allowing for much larger data sets to be processed.

  3. 6. The code begins again at step 1 and repeats the process until all areas have achieved self-containment; that is, it repeats until all areas are included in the Success Table.

Figure A D.1 provides an overview of the above process.

copy the linklink copied!
Figure A D.1. Scheme of the computational flow implemented by the Python package SLA-ZTA
Figure A D.1. Scheme of the computational flow implemented by the Python package SLA-ZTA

Source: Provided by Statistics Canada, 2019.

After the above clustering process has been completed, a number of areas will usually remain unassigned. These are either areas that are self-contained without clustering or areas for which no commuting information is present or available.

A secondary programmatic process is run to deal with these unassigned areas and to modify the existing SLAs, where necessary, to ensure a set of geographically contiguous and logical areas that cover the whole set of areas. For the Canadian specification, this was done using a ruleset based on the rules already in use by the CMA/CA delineation process. This secondary process is not available in the Python package as it makes extensive use of the Canadian Statistical Geographic Classification system and would currently be unsuitable for use on other data.

When compared to the results of the R package LabourMarketAreas (LMA), it was found that the two systems produce largely comparable results for Canada when the same parameters are given to both systems. There are three major differences between the programmes that can affect the results that are produced:

  1. 1. The LMA system excludes municipalities that do not have both in and out commuting data. The degree of impact that this has on the data will depend on the suppression procedures that are used and the frequency with which this situation occurs in the area of interest.

  2. 2. The two systems select areas in slightly different ways, with the LMA system selecting areas with the lowest self-containment and the SLA system selecting areas with the farthest distance to go to reach success. Because smaller areas have a higher threshold to reach for success, this means that smaller areas will tend to be chosen first more often even when their overall level of self-containment is identical to larger areas.

  3. 3. The LMA system allows clusters to be dissolved if they are still not successful after clustering, while the SLA system does not. This is a methodological difference due to the focus of the SLA areas on non-metropolitan areas where the choice of pairing areas can be quite small.

Because of the above issues, there is a tendency for the SLA system to produce a slightly greater number of areas than the LMA system when the same parameters are used. Apart from the exclusions mentioned above in point 1, this reflects minor differences in how areas are subdivided and appears to be largely due to the SLA system being designed to find small functional areas where possible.

Finally, it should be recalled that the Canadian self-contained labour areas presented in Chapter 5 were designed from their formation to complement the already existing system of Census Metropolitan Areas and Census Agglomerations. Because of that, some programmatic choices and the specification of threshold values were adapted to the specific need to create usable non-metropolitan functional areas. Applications to other national contexts of the SLA-ZTA Python package have been implemented as part of this research undertaking, with minimum modifications of the package.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

https://doi.org/10.1787/07970966-en

© OECD 2020

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at http://www.oecd.org/termsandconditions.