5. Piloting a new disability assessment in four regions of Italy

The assessment of civil invalidity in Italy which determines a person’s rights and entitlements to benefits and services is outdated and incomplete as it is limited to the identification of a medical condition, or impairment, which determines the percentage of civil invalidity without consideration of the person’s actual disability experience and the context in which the person lives. With the passage of the Enabling Act in 2021 (Law 227/2021), Italy has taken a first step towards a reform of its disability policies. The implementing decrees of the Enabling Act, which are currently being drafted and which will also benefit from the contribution of the project presented in this report, go in the direction of providing that the assessment of the disability condition and the revision of its basic assessment processes will be carried out in accordance with the provisions of the Convention of the Rights of Persons with Disabilities, also through the adoption of the WHO Disability Assessment Schedule (WHODAS) – the focus of the pilot project carried out in Italy – within the new basic assessment system provided for by the law.

To prepare the ground for reform, the WHODAS tool was piloted in four regions of Italy – Campania, Lombardy, the Autonomous Region Sardinia (henceforth, Sardinia), and the Autonomous Province of Trento (henceforth, Trentino) – testing the feasibility of the inclusion of functioning information into the current assessment of civil invalidity. WHODAS was developed by the WHO as a tool to identify the kind and nature of problems people are facing in their lives, in alignment with the International Classification of Functioning, Disability and Health (ICF). WHODAS has been tested successfully in many countries and different contexts. While Italy can draw on the experiences in other countries, it was also important to test the validity and reliability of the tool in Italy, and the ability of social workers to implement this tool. This chapter summarises the findings of the pilot that took place from October 2022 to April 2023.

In the ICF framework, information about categories of Activities and Participation can be collected either from the perspective of capacity (reflecting exclusively the expected ability of a person to perform activities considering their health conditions and impairments) or the perspective of performance (reflecting the actual performance of activities in the real-world environmental circumstances in which a person lives). Information about capacity typically represents the results of a clinical inference or judgment based on medical information, while performance is a true description of what occurs in a person’s life. The two perspectives are therefore very different, although capacity constitutes a determinant of performance.

The ICF understands “disability” to be any level of difficulty in functioning in some domain, from the perspective of performance. The WHO has developed, tested, and recommended WHODAS as a tool that can capture the performance of activities by an individual in his or her daily life and actual environment. The “actual environment” is represented in the ICF in terms of environmental factors that act either as facilitators (e.g. assistive devices, supports, home modifications) or as barriers (e.g. inaccessible houses, streets and public buildings, stigma, and discrimination). The WHODAS questionnaire is structured around six basic functioning domains: cognition, mobility, self-care, getting along with people, life activities, and participation.

The “clinical” version of the WHODAS questionnaire collects information about problems in functioning – i.e. disability – by means of a face-to-face interview conducted by a trained interviewer who asks a set of standardised questions and, if necessary, follow-up probe questions. WHODAS uses a 5-level response scale (1 = None, 2 = Mild, 3 = Moderate, 4 = Severe, 5 = Extreme or Cannot do) to rate each question. In extraordinary circumstances (e.g. COVID-19 lockdown), WHODAS can be administered in a telephone or video interview by the trained professional. Respondents are informed that their answers about each domain of functioning should adopt the perspective of performance, i.e. that they should describe what they do considering the experiences in their daily life and the environmental barriers and facilitators they experience. For the pilot, the 36-item version of WHODAS was chosen to create a full picture of the disability experienced by the respondent in their everyday life.

A total of 3 307 individuals participated in the pilot. The data for 65 individuals were not included in the analysis because of high missing values in their responses. The socio-demographic characteristics of the remaining N = 3 242 individuals are shown in Table 5.1, region by region. The proportion of male participants was below 50% for all four regions. Mean ages differed significantly across regions, with 52.2 years in Campania, 49.8 years in Lombardy, 50.7 years in Sardinia, and 48.8 years in Trentino. An average of about 11 years of education was reported for all regions. Most participants indicated their marital status as being married and most respondents were living independently in the community. The percentage of individuals living in assisted living was highest in Trentino.

The data on employment was collected in different manners so that for some of the data collected in Campania detailed information is missing, i.e. it was not possible to determine if unemployment was health-related or not or if the work activity was for an employer or self-employed. Overall, participants indicated having paid work (39.7%) or being unemployed for either health reasons (21.6%) or other reasons (15.6%). The share in paid work was especially high in Lombardy (47.7%) and Trentino (48.2%).

Table 5.2 presents the frequency and percentages of observed ICD-11 diagnostic chapters, with the caveat that the data on health conditions were collected differently in the four regions. Health condition codes were linked to the closest ICD-11 chapter; the latest version of WHO’s International Classification of Diseases (ICD-11). Many people in the data set have more than one diagnosis. If several diagnoses would link to just one ICD chapter, the chapter was reported only once. The situation is different for people with more than one condition from different ICD chapters. Information regarding the priority of different diagnoses was unavailable for most data. It was therefore decided to include in the analyses by health condition all ICD-chapter diagnoses recorded for a person. Such, the total number of conditions presented in Table 5.2 is larger than the total sample as a person with two different conditions would be counted twice. This should not affect the findings by health condition.

Figure 5.1 visualises how the 36 items of the WHODAS questionnaire have been rated. The percentage of missing values was highest, i.e. about 50%, for items D5.5 to D5.8 that assess difficulties at work (or in school). These four questions have been removed from the construction of the WHODAS score because of the high share of missing values. More than 30% of missing values were also found for two other questions, D5.1 (Taking care of household responsibilities) and D5.2 (Doing most important household tasks), as these two questions were not consistently assessed across all the regions at the start of the pilot.

Figure 5.2 shows the distribution of the total raw scores obtained when adding up the 32 items of WHODAS. The total raw WHODAS score ranges from 32 to 160, although a few total scores below 32 are possible as the scores are computed on the raw data with some missing values (less than 20%). Coloured segments in Figure 5.2 indicate the position and value of the 1st, 2nd, and 3rd quartiles, with a median score (2nd quartile) of 75. The density lines in Figure 5.3 show the density of the observed scores (black line) and the corresponding normal distribution with the same mean and standard deviation (dotted line). Scores in this sample for Italy are distributed relatively normally, which was a common finding also in other countries where WHODAS was pilot tested (including Bulgaria, Greece, Latvia, Lithuania, Romania, Seychelles).

The distributions of the WHODAS raw scores in the four regions that participated in the pilot present some but small differences (Figure 5.4). The highest median WHODAS score (blue dotted line) is found for Campania (Q2 = 84) and the lowest median score in Trentino (Q2 = 70). Higher WHODAS raw scores indicate higher levels of disability among those going through a disability assessment. Otherwise, however, the figures show rather normally distributed WHODAS raw scores for all four participating regions.

One objective of the assessment pilot was to assess the validity and reliability of the WHODAS instrument in Italy. This is done through Rasch analysis, a statistical method from the field of probabilistic measurement first introduced by the Danish mathematician George Rasch (Rasch, 1960[1]). Rasch analysis is essentially testing several measurement assumptions (Bond, 2015[2]; Tennant and Conaghan, 2007[3]): (1) the targeting of a scale, (2) the model reliability, (3) the ordering of the items’ response options, (4) the absence of correlation between items (so-called Local Item Dependencies, or LID), (5) the fit of the items to the Rasch model, (6) the absence of effects of person factors such as gender and age on item responses (so-called Differential Item Functioning, or DIF), and (7) the unidimensionality of the questionnaire. If these measurement assumptions can be met, a questionnaire can be considered psychometrically sound and derived total scores therefore be considered interval-scaled and operative for measurement.

For a well-performing questionnaire, it is expected that the difficulty of the items is matched to the level of ability of the measured population, i.e, the questionnaire should not be too easy or too difficult. Statistically, good targeting (assumption #1) is achieved if the mean item difficulty and mean person ability are approximating zero. A Person Separation Index (PSI) above 0.8 speaks for a good reliability of the scale and values above 0.9 for very good reliability (assumption #2). The PSI indicates how well the scale can discriminate levels of functioning in the population. The Cronbach α, which is typically also reported, is a classical measure of the internal consistency of the data, i.e. how well the items work to describe one construct (Nunnally and Bernstein, 1994[4]). In the presence of disordered response options (assumption #3), an analysis of response probability curves allows to determine which response options cause problems and decide on strategies to aggregate disordered response options. For example, if for an item the response options 2 and 1 appear reversed and indicate that an increase of difficulty cannot be discriminated, the item responses can be recoded so that these options represent only one level of response. LID often occur when items are redundant and measure approximately the same aspect of a construct (assumption #4). The most widely reported statistic for the item dependencies is the correlation matrix of the Rasch residuals (Yen, 1984[5]). Residual correlations above 0.2 are considered as not acceptable and a way to address these local item dependencies, without deleting items, is to aggregate (i.e. to sum up) the correlating items into so-called testlets (Yen, 1993[6]). In item testlets, the ordering of the thresholds is not expected anymore. For good item fit (assumption #5), infit and outfit values are expected to be below 1.2 (Smith, Schumacker and Bush, 1998[7]). The outfit statistic is more sensitive to outliers as the infit statistic. Ideally, items of a questionnaire should be fair and not favour sample subgroups. The analysis of DIF allows to flag exogenous variables, or DIF variables (assumption #6), which conduct to a lack of invariance of the item difficulty (Holland and Wainer, 1993[8]). It is worthwhile to note that a DIF analysis is not always indicating a metric bias but can also simply represent subgroups with unequal underlying ability (Boone, Staver and Yale, 2014[9]). DIF analysis was conducted for age and gender, to determine the items which are sensitive to those external covariates. Finally, a questionnaire should measure only one construct. If a questionnaire shows to have several separate dimensions, the validity of one summary total score is not supported. Unidimensionality (assumption #7) was assessed with a principal component analysis of the Rasch residuals (Smith, 2002[10]). Typically, a first eigenvalue lower than 1.8 is deemed indicative of unidimensionality. Based on simulation analyses, Smith and Miao (1994[11]) suggested considering the size of the second component instead, with values below 1.4 indicative of unidimensionality.

The Rasch analysis for the Italian dataset showed that the scale is multidimensional, with a strong tendency of the items to load (i.e. to correlate with other variables) within WHODAS domains. Only a few items loaded across domains and, similarly, only a few items were free of dependencies. To solve the issues of multidimensionality and local-item dependencies, correlating items were aggregated by accounting for the domain structure of the WHODAS questionnaire. Findings can be summarised as follows:

  1. 1. The population included in this analysis presented a very good targeting to the scale.

  2. 2. The item reliability was high but also inflated at the beginning of the analysis because of item dependencies (PSI=0.95, Cronbach α=0.95). Reliability was still found to be good also after the adjustments were made (PSI=0.88, Cronbach α=0.89).

  3. 3. The response thresholds of 23/32 items of the WHODAS questionnaire presented disordering. Locally dependent items can be an explanation for the disordering, as well as a lack of discrimination between the two first response options, i.e. answer categories “None” and “Mild”.

  4. 4. The analysis of the residual dependencies showed strong local dependencies among most items of the WHODAS questionnaire, with a tendency of questionnaire items from the same domain to associate. To address these dependencies, items were aggregated considering the domain structure of the tool. The thresholds of the testlets are not expected to be ordered.

  5. 5. The item fit is good if the infit and outfit values are below 1.2. Three out of the 32 items showed misfit with infit or outfit above the cut-off: D1.5 (Generally understanding what people say), D6.4 (How much time did you spend on your health condition or its consequences), and D6.6 (How much has your health been a drain on the financial resources of you or your family). After aggregation of the items by domain, all testlets showed good infit and outfit values, below 1.2.

  6. 6. The DIF analysis indicated that all WHODAS domains are sensitive to age. Responses to domain 1 (Cognition – Understanding and communicating) and domain 5(1) (Life activities – Taking care of the household) are also affected by the gender of the respondent.

  7. 7. The principal component analysis indicated that the items cluster by domains which results in multidimensionality, with a very high 1st eigenvalue of 5.29 and a 2nd eigenvalue of 2.87. After adjustments, i.e. aggregation of items by WHODAS domains, the 1st eigenvalue dropped to 1.93 and the 2nd eigenvalue to 1.29, indicating unidimensionality according to the defined criteria.

In conclusion, statistical psychometric testing confirmed the validity and reliability of the WHODAS tool in the Italian context. Statistical analysis of the psychometric properties of WHODAS with the data piloted in Italy shows that functioning data collected with WHODAS display robust psychometric properties. It is important to keep in mind that the WHO developed WHODAS explicitly to statistically capture the construct of functioning from the perspective of performance – i.e. the experience of performing activities by a person with an underlying health problem in their everyday life environment. Based on satisfactory psychometric properties, one can confidently conclude that information collected with the WHODAS questionnaire is robust, viable, and relevant and that it validly represents the construct of disability as understood in the ICF and the UN Convention on the Rights of Persons with Disabilities (UNCRPD). Including the WHODAS questionnaire into disability status assessment in Italy would therefore (i) significantly strengthen the method of assessment currently in use (which is a medical assessment based on the existence of impairments) and align it with Italy’s general approach to disability; (ii) bring it closer to the ICF and UNCRPD understanding of disability; and (iii) harmonise the approach to assessment with the ICF functioning-based approach used in subsequent individual needs assessments.

There are no agreed and published cut-offs available for the WHODAS score that would be applicable to a population with diverse health conditions to categorise the severity of their disability. Having established cut-offs would allow to detect individuals with significant disabilities and to reflect and, eventually, reconsider attributed civil invalidity percentages. Some studies report the 90th or 95th percentile of the WHODAS score distribution as being the best cut-off to diagnose severe disability or dysfunctionality in some specific groups, such as post-partum women (Mayrink et al., 2018[12]) or the elderly population (Ferrer et al., 2019[13]). A minimal clinically important difference in scores for the WHODAS has not been established yet (Federici et al., 2016[14]). However, based on several previous and comparable pilot projects conducted by the World Bank using the WHODAS questionnaire, in Greece, Latvia, Lithuania, and Bulgaria, meaningful WHODAS disability cut-off points for the Rasch-based 0-100 score are suggested as follows:

  • Score 0-25: No functioning restrictions (i.e. no difficulties in performance/disability)

  • Score 26-40: Moderate functioning restrictions (i.e. moderate difficulties in performance/disability)

  • Score 41-60: Severe functioning restrictions (i.e. severe difficulties in performance/disability)

  • Score 61-100: Very severe functioning restrictions (i.e. very severe difficulties in performance/disability)

A score of 40 would thus be a central cut-off for determining the presence of a disability and, thus, eligibility for services. In total, the sample presented N = 74 (2.3%) of individuals having no functioning restrictions, N = 972 (30.0%) of individuals with moderate functioning restrictions, N = 2 120 (65.4%) of individuals with severe functioning restrictions, and N = 76 (2.3%) of individuals with very severe functioning restrictions.

Later in this chapter, additional cut-offs are introduced to split the two middle groups in which most people are concentrated – thereby distinguishing lower and higher moderate functioning restrictions (with WHODAS scores of 26-34 and 35-40, respectively) as well as lower and higher severe functioning restrictions (with WHODAS scores of 41-48 and 49-60, respectively).

The civil invalidity percentages attributed to persons with health problems in Italy, following the assessment, can be divided into different categories in various ways. While there are no cut-off points for a discretionary assessment, entitlement for various benefits and supports suggest the following as a meaningful split:

  • 0-33%: no invalidity

  • 34-66%: moderate invalidity, of which

    • 34-45%: lower moderate invalidity

    • 46-66%: higher moderate invalidity

  • 67-99%: severe invalidity, of which

    • 67-73%: lower severe invalidity

    • 74-99%: higher severe invalidity

  • 100%: very severe invalidity

In total, the pilot sample presented N = 81 (2.8%) of individuals with no civil invalidity, N = 1 129 (38.8%) of individuals with moderate civil invalidity, N = 1 076 (37%) with severe civil invalidity, and N = 623 (21.4%) of individuals with very severe civil invalidity rated as 100%. There were N = 333 (10.3%) individuals in the data set with no reported civil invalidity percentage. The different levels of invalidity are key to obtaining supports from Italy’s social protection system. For example, with a civil invalidity percentage above 46% individuals can request employment support, with more than 67% prostheses are provided free of charge, and with more than 74% people can receive a non-contributory disability allowance.

Table 5.3 presents the socio-demographic characteristics of the sample disaggregated by level of disability based on the WHODAS score. With 68.9%, the percentage of men was higher in the group with no disability and close to or below 50% otherwise. There is a statistically significant increase in mean age (p-value < 0.001) across disability levels from 45.7 years with no disability to 53.5 years with very severe disability. The average number of years of education decreases significantly with increasing disability status (p-value < 0.001) from about 12 years with no disability to about 11 years with very severe disability. With regard to the living situation, 77.3% of participants with very severe disability lived independently in the community, with shares above 90% for all other groups. The percentage of persons in paid work decreased from 56.8% in the group with no disability to 21.1% for those with very severe disability.

Table 5.4 presents the socio-demographic characteristics of the sample disaggregated by the level of civil invalidity, following the above-proposed cut-off categories. The percentage of men is higher and above 50% only in the group of persons with no civil invalidity. Again, there is a statistically significant increase in the mean age (p-value < 0.001) across degrees of civil invalidity, from 45.2 years in the group with no invalidity to 52.9 years in the group with very severe civil invalidity. The average number of years of education is slightly above 11 years across all invalidity levels. The share of people living independently in the community is about 85.2% among those with very severe invalidity and above 90% for the other groups. Finally, the percentage of persons in paid work decreases from about 44.4% in the group of persons with no or moderate civil invalidity to 32.4% in the group of persons with very severe disability.

Table 5.5 presents the mean WHODAS score, on the 0-100 scale, disaggregated by health condition, and the distribution of the population across ICD-11 chapters. Individuals with “Symptoms, signs or clinical findings not classified elsewhere” presented the highest mean WHODAS score of 46.66. The least disabling conditions as measured by WHODAS are development anomalies with a mean score of 40.8. Among the four most frequent pathologies, the category “mental, behavioural or neurodevelopmental disorders” has the highest mean WHODAS score (44.95) while the other three main impairments (neoplasms, circulatory system diseases, and musculoskeletal system diseases) all have mean scores around 43.

Table 5.6 disaggregates the sample by pathology and degree of civil invalidity. By and large, the results show that mean WHODAS scores tend to increase with the invalidity degree for most pathologies although the results must be interpreted with caution, due to the small number of cases in the group with no invalidity (N = 81). It is not the same condition that consistently receives the highest WHODAS rating across the different civil invalidity degree groups. Looking at the four main pathologies only, for which the sample size is large enough to draw reliable conclusions, the following can be observed:

  • Diseases of the musculoskeletal system are the dominant pathology among people with a moderate level of civil invalidity (25.5% of those with degrees 34-66%). For those diseases, mean WHODAS scores clearly and gradually increase with the invalidity degree, from around 38.1 to 49.8.

  • Neoplasms are the dominant pathology among people with very severe levels of invalidity (38.5% of those with a degree of 100%). Mean WHODAS scores are lower than for the other main diseases, at all invalidity levels with degrees above 33%.

  • Diseases of the circulatory system are particularly frequent in the two middle invalidity categories, moderate and severe disability (i.e. degree 34-99%). Mean WHODAS scores generally lie between those for neoplasms and for diseases of the musculoskeletal system.

  • The percentage of mental, behavioural, or neurodevelopmental disorders increases slightly with an increasing invalidity degree, with a high WHODAS mean compared to the other main diseases.

  • The mean WHODAS scores increase with the invalidity degree for all four main pathologies.

Table 5.7 looks at the mean WHODAS score and the mean civil invalidity percentage per ICD chapter, comparing the situation when the linked health condition chapter appeared as standalone diagnostical information versus when it was reported in addition to other health condition chapters; thereby comparing cases of single morbidity with cases of comorbidity. The average WHODAS score per ICD chapter hardly changes whether it is a single diagnosis or part of multiple diagnoses. In contrast, the average civil invalidity percentage is in many cases higher when a person is diagnosed with multiple conditions. In other words, the WHODAS score per ICD chapter varies significantly less than the civil invalidity percentage: it appears that co-morbidity has an influence on the civil invalidity percentage but not on the WHODAS score. The data do not allow an interpretation of this finding but the discretionary freedom in the civil invalidity assessment could play a role, i.e. assessors perceiving people with co-morbidity as having a more severe disability – a finding that is not corroborated by the corresponding WHODAS scores.

The following figures pursue the comparison between the disability score based on the WHODAS questionnaire and the result of the civil invalidity assessment. Figure 5.5 looks at the distribution of WHODAS scores against the distribution of civil invalidity percentages. While WHODAS disability scores are distributed normally around a mean of 43.2, with a standard deviation of 8.5, civil invalidity percentages seem to be distributed erratically, with higher frequencies at distinct locations on the continuum linked with critical cut-offs for eligibility for specific social benefits and services. The discretionary method of assigning invalidity percentages with limited guidelines and standards might explain the concentration at the cut-offs. In practice, this turns the invalidity scale into an ordinal scale with just a few possible outcomes.

Table 5.8 shows the four civil invalidity groups disaggregated by WHODAS disability groups. In interpreting these findings, it is important to keep in mind that a moderate civil invalidity level should not necessarily be understood to be equal to a moderate disability level. These are two different perspectives, correlated only modestly: WHODAS measures lived experience of disability in the person’s everyday environment; civil invalidity assesses disability based on the person’s impairment (medical approach). The table shows that the number of individuals that fall in opposite severity groups is negligible: there is only one person with a very severe WHODAS disability but no civil invalidity and no one with very severe civil invalidity and no WHODAS disability. However, less extreme seemingly contradictory cases are more frequent: there are, for example, 94 persons with very severe civil invalidity and only moderate WHODAS disability. Likewise, the data include 40 persons with severe WHODAS disability but no civil invalidity.

Figure 5.6 also compares the distribution of individual civil invalidity percentages and WHODAS scores. The figure shows the full distribution of data points for the WHODAS score (y-axis) and the civil invalidity percentage (x-axis). Horizontal lines represent the cut-offs for the WHODAS score, from no disability to moderate, severe, and very severe disability, and vertical lines represent the cut-offs for the civil invalidity percentage (again, no, moderate, severe, and very severe). The two scores show a positive correlation but only at a very moderate level (R = 0.33). This is expected because disability cannot be inferred from medical conditions or impairment only: two individuals with the same medical diagnosis will be assigned the same percentage of disability based on medical criteria for the assessment. However, they may experience different levels of disability (functioning limitation and participation restrictions or performance in the ICF disability understanding) depending on their environment.

Some notable exceptions can be observed on the plot, such as individuals having 0% of civil invalidity while reporting moderate to very severe disability according to the WHODAS questionnaire looking at their functioning levels across different life domains. Similarly, some individuals with a civil invalidity percentage above 66% (i.e. with severe or very severe invalidity) are found not to have any disability based on their WHODAS score.

WHODAS functioning scores by current levels of civil invalidity demonstrate that medical assessment alone does not differentiate well between different levels of disability, also suggesting rather low reliability and precision of the civil invalidity ratings in Italy today. Figure 5.7 shows the density lines for the WHODAS scores for the four levels of civil invalidity. While WHODAS scores for very severe functioning restrictions stand out at least a bit (red line), the difference between severe and moderate level of civil invalidity (orange and light green line, respectively) appears to be very small. These density lines suggest the presence of both false positives (cases with high invalidity percentage and low WHODAS score) and false negatives (cases with low invalidity percentage and high WHODAS score). Also, a more accurate assessment would show the density line of the group with no or very low level of civil invalidity (dark green line) positioned more towards the left-hand side of the figure. Again, this suggests that the medical information alone may misrepresent the true extent of individual disability experienced in daily life.

The results in Figure 5.7 come as no surprise as WHODAS was designed explicitly to assess so-called whole-person disability, while the medical approach to assessing disability used in Italy does not directly assess disability but infers disability based on the underlying health condition or impairment. Sometimes there is a close correlation between the severity of health conditions and the severity of resulting disability; but sometimes there is no such correlation. The latter is best seen in the case of mental health problems where the impact of the person’s environment may greatly increase the impact of the experience of, say, depression. This is the basic validity problem with medically based disability assessment. As pointed out above, although the presence of a health condition and associated impairment is a precondition for disability, inferring the level of disability from the presence of the underlying health condition is scientifically problematic. The level of disability that an individual experiences, as the ICF argues, is determined by the interaction between the person’s health condition and associated impairments and the environment in which the person lives. WHODAS was designed to directly capture this disability experience while assessment of disability based solely on medical grounds cannot do so validly or reliably.

The WHODAS pilot in Italy has shown that it performs well in capturing the actual experience of disability. The question is how best to include the functioning information captured by WHODAS in the system of disability status assessment in Italy. Medical information will remain relevant to disability assessment; the ICF makes it clear that without an underlying health condition and associated impairments, disability does not exist. Information about health status provides the basis for identifying specific physical and mental dimensions of activities and areas of participation vulnerable to disability, which can then be directly confirmed by the findings received from the WHODAS questionnaire. Medical information provides essential guidance on the medium and long-term trajectory of disability that the individual will experience, including whether the person faces a progressive decline in health capacity resulting in more and more disability, or the reverse, a progressive improvement. While medical information remains an essential component of disability assessment, the medical review must also change with better standardisation and methodological guidelines and possibly using the ICF body functions and body structures.

As medical information is essential, this section of the report discusses possible options for combining medical and functioning information in the assessment of disability in Italy – rather than replacing the current medical approach altogether by the WHODAS questionnaire. Several methods were tested on the pilot dataset to address this question. These methods can be grouped here into three principal strategies: (1) averaging the medical assessment percentage with the WHODAS score to arrive at a final disability assessment score, (2) flagging persons whose WHODAS score and disability severity are different from the severity group based on the percentage determined by medical information alone, and (3) scaling the civil invalidity percentage by a certain coefficient ‘x’ when the WHODAS-score exceeds or falls below a certain threshold or reference value. It is important to add that as WHODAS is used in Italy, more data are collected. This data can be analysed using the techniques from this report to continually update and recalibrate parameters and cut-off points. In more detail, the three approaches work as follows:

  1. 1. Averaging – averaging in some predetermined way the attributed civil invalidity percentage and the WHODAS score. This approach is based on the theory that, together, medical information and functioning scores contribute, to different degrees, to a realistic and valid assessment of disability.

  2. 2. Flagging – identifying persons whose WHODAS score differs from the medically determined civil invalidity percentage and flagging these individuals to request from them additional information or even a full reassessment. When an individual has a WHODAS score over or below some cut-off value, this suggests that the medical score alone does not adequately capture the experience of disability and a second-level assessment should be conducted.

  3. 3. Scaling – the civil invalidity percentage can be altered (i.e. raised or lowered) to reflect the WHODAS score by means of a score-based coefficient. This approach assumes that at the core of disability and civil invalidity assessment is the medical problem that the individual experiences, but at the same time, that the performance is modified (to some extent) by environmental factors that need to be understood to augment or diminish the medical score.

Averaging, flagging, and scaling are three of several potential approaches to bringing together two scores that measure different phenomena but which, together, constitute our best assessment of disability. Each approach is grounded in the ICF’s understanding of disability as the outcome of an interaction between a person’s underlying health condition and impairment on the one hand and the physical, human-built, interpersonal, attitudinal, social, economic, and political environment in which the person lives on the other hand. The three approaches differ, however, in how they weigh the impact of the respective medical and environmental determinants of disability. The next section describes the results of applying strategies that were tested using different weighting combinations.

This section presents in more detail the three options to include functioning into disability assessment in Italy. Each option follows the ICF in recommending a combination of medical and functioning assessment (with the latter provided by WHODAS). Option A is the situation in which WHODAS scores are considered, or disregarded, in a purely discretionary manner. Options B (averaging strategies), C (flagging strategies) and D (scaling strategies) are quantitative. Each option has advantages and disadvantages.

The framework for evaluating the pros and cons of every approach draws on key scientific principles that determine the credibility of any disability assessment process: validity (the extent to which the option relies on a true assessment of disability); reliability (the ability of the option to arrive at the same assessment of the same case by different assessors); transparency (the degree to which the assessment process and outcomes can be described and understood by all stakeholders); and standardisation (the extent to which the process resists distortion or alteration over time and across locations).

Option A is the option in which an individual or committee reviews medical scores and the WHODAS scores and makes a judgment about the extent of disability as the individual or committee sees fit. This is a purely discretionary option, surprisingly common in practice. This approach is subject to manipulation, lacks validity and reliability, and is utterly non-transparent. The option is given here as a contrast to the remaining options B, C, and D, but also, in fairness, because some countries continue to rely on this option for disability assessment (strategy #1). The authors of this report do not recommend this option. Numerous interactions with officers involved in disability assessment in different countries suggest that medical professionals involved in the assessment of disability are confident they can consider functioning and the experience of disability as part of the medical description of the applicant’s situation. One often hears medical assessors claim that they take functioning fully into account when examining medical records. One implicit result from the pilot is, however, that this assumption is not grounded in evidence.

Averaging, flagging, and scaling are quantitatively driven options, very different from Option A. In different ways and for different reasons, they satisfy not only the basic psychometric assumptions of validity and reliability but each, to different degrees, strives to achieve transparency and standardisation.

In the Italian pilot WHODAS data set, there is a relatively small percentage of persons indicating no functioning problems at all (only 2.3%), among which the majority had a moderate or severe degree of civil invalidity. Weighting the civil invalidity percentage with the WHODAS score would adjust levels of invalidity by accounting to some degree for the observed and experienced disability level assessed by the WHODAS questionnaire. To get a full sense of the range of possible approaches under Option B, four weighting schemes are shown: (i) 75% civil invalidity percentage and 25% WHODAS score; (ii) 50% each; (iii) 25% civil invalidity percentage and 75% WHODAS score; and (iv) 0% civil invalidity percentage and 100% WHODAS score (represented by strategies #2 to #5). Option #5 shows the result of WHODAS alone.

Advantages of averaging: (i) An assessment of the level of functioning plays a significant role in the determination of eligibility for disability benefits so that the eligibility for benefits is not solely based on purely medical criteria. (ii) The averaging approach minimises the impact of the inherent psychometric problems with the civil invalidity percentage based on the Barema-based medical assessment. (iii) The assessment of the level of functioning is empirically and statistically verified. (iv) This option yields high levels of validity and reliability. (v) Merging the results of two assessments scaled by means of “weighted averages” is fully objective, transparent, and non-discretionary. (vi) The method is not sample-dependent.

Disadvantages of averaging: (i) There are, potentially, an infinite number of combinations of weighting schemes (i.e. “strategies”), each of which affects the set of eligible applicants differently and has different budgetary and political consequences. This is an unavoidable fact about the nature of disability as a continuum and the fact that there are not yet scientifically verified or objective cut-offs for severity on a 0-100 continuum. (ii) Any strategy selected will be objectionable to individuals who, under that strategy, will not be certified as having a disability and thus not eligible for any benefits. This signals the need for clear and transparent information dissemination and a solid grievance redress system that may include using tools for clinical testing and determination of functioning.

Six different flagging strategies are represented by strategies #6 to #11. The idea of this strategy is to highlight individuals whose civil invalidity percentage is unexpected in view of the WHODAS score. A conservative approach would be to flag individuals with scores in the upper (or lower) extremes of the WHODAS score distribution of the sample, who have a very small (or large) civil invalidity percentage (#6). The next four approaches do not use the sample distribution but the distribution of scores within civil invalidity degree groups to increase or decrease the invalidity percentage. The approach #11 combines strategies #7-10 and considers all cases that fall into one of these groups.

Advantages of flagging: (i) Scientifically robust and based on actual data. (ii) Shows that the purely medical approach to disability assessment may not accurately assess disability in many cases – in which, as reported in the WHODAS score, a person is experiencing more, or fewer, functioning problems in their lives than what the health condition is thought to imply. (iii) High levels of validity and reliability.

Disadvantages of flagging: (i) WHODAS cut-offs for different degrees of functioning problems are based on the experiences from past pilots and some evidence from the scientific literature. Sensitivity analyses are not available to this point. More precise cut-off values specific to Italy may be introduced at later time points when more information on functioning is collected (assuming WHODAS will be introduced into the existing system). (ii) Technically robust methodological and procedural instructions will have to be developed to guide the reassessment process to ensure transparency.

Even with the caveat on the cut-off points for disability severity, the flagging method may be introduced through a specifically designed two-step administrative procedure.

The scaling approach, represented by strategies #12 and #13, reproduces an approach that is in some form used in some countries (e.g. Lithuania) though generally in a rather opaque way, namely, modifying the civil invalidity percentage assigned by a disability assessment committee by means of a coefficient representing functioning information (e.g. generated by a WHODAS score). The idea behind this approach is to avoid relying on a medical determination of disability exclusively, as such an approach undervalues the actual impact of health conditions on a person’s life and functioning performance.

Two strategies to illustrate the scaling approach are used (there are, in theory, many other possibilities). The first strategy would look for individuals with high disability, according to their WHODAS score, above the WHODAS cut-offs of 40 and 60 to augment their civil invalidity percentage, either by a coefficient of 1.25 (with WHODAS scores above 40) or 1.5 (with WHODAS scores above 60). Reversely, in the second strategy used, individuals with a very low disability according to their WHODAS score, below the WHODAS cut-offs of 40 and 25, are selected to reduce their civil invalidity percentage either by a coefficient of 0.95 (with WHODAS scores below 40) or 0.9 (with WHODAS scores below 25). The choice of coefficients here is to some extent driven by the objective to achieve similar impact in both directions.

Advantages of scaling: (i) Using a coefficient value generated statistically is a common and widely used approach. (ii) A coefficient approach (increasing or reducing the medically-determined civil invalidity percentage considering the corresponding functioning score) is the most intuitive way to combine the scores of very different assessments – medical and functioning – into a single score. (iii) This option incorporates the insight that a medical determination alone can often miss instances where people have only moderate or very high disability needs. (iv) This option, because of the psychometric properties of WHODAS, would have high levels of validity and reliability.

Disadvantages of scaling: (i) As with other options, there are many possible variations of approach D with different outcomes – in this report only two possibilities are presented, as an illustrative example. Although the scaling approach itself is intuitively understandable and can be made transparent to the public, the scientific and statistical justification for Option D is therefore somewhat technical and may not be easily understandable by a lay public.

Table 5.9 provides an overview of the testing strategies that were considered and gives the number of individuals who would have a moderate, severe, or very severe disability after adjusting for the WHODAS score. Further, and maybe most importantly, the table also shows the number of individuals who would have their civil invalidity severity ranking changed towards a higher degree (total upshifts) or a lower degree (total downshifts). In brief, the results are as follows:

  • The four averaging strategies show that the use of WHODAS generally generates more upshifts to higher invalidity degrees than downshifts. Giving WHODAS a weight of 25% (strategy #2) changes little, as it affects only 2.5% of the sample and of those, most would see a downshift – these are people just above one of the invalidity thresholds who seem to function well, maybe because the environment is supportive, and their needs are addressed. The more weight WHODAS receives, the more people are affected and the more upshifts occur. With a 50% weight to both WHODAS and civil invalidity (strategy #3), 8.5% of the sample would be affected, with an equal number of upshifts and downshifts. With WHODAS only (strategy #5), 42% of the sample would see a change in the invalidity severity, with two-thirds seeing an upshift. Most upshifts are a shift from moderate to severe invalidity, potentially generating more eligibility for a disability allowance. On the contrary, the number of people with very severe invalidity considered to be non-self-sufficient and, thus, in need of constant care would fall drastically, from over 20% to only 2% of the sample. This suggests that current medically based disability assessment may be overestimating the degree of disability and policies may be setting the wrong priorities, and incentives.

  • The six flagging strategies show that very few people currently receive an invalidity rating that is drastically different from their actual disability experience, as measured by WHODAS. Only 2% of the sample have extremely low or extremely high WHODAS scores (strategy #6) and only 5.5% of the sample would be flagged as having an invalidity rating very different from their WHODAS score (strategy #11). Among those 5.5%, two-thirds would potentially see a downshift in their current severity rating depending on the result of the indicated second assessment and most of them would be people classified with 100% civil invalidity although experiencing much less disability. (For supplementary flagging variants, see section 6.3).

  • The coefficients chosen for the two scaling strategies generate a situation in which over 8% of the sample would see their invalidity rating increased because of (very) severe disability according to WHODAS (strategy #12) and, similarly, close to 8% would see their invalidity rating lowered because of no or only moderate disability experience according to WHODAS (strategy #13). The large difference in the size of the coefficients is a result of the current invalidity assessment and rating, with so many people found just above the next invalidity threshold. A clear disadvantage of strategy #12 is that it increases the already large number with a very severe invalidity rating. Combining strategy 12 and strategy 13 would imply that 16% see their rating changed.

The pilot evaluation suggests that the current disability assessment system in Italy would benefit from the inclusion of functioning information into the assessment method in at least three ways:

  • the assessment of disability would be more precise and accurate, reflecting the real-life experience of disability and identifying some people who are not well identified by a purely medical approach;

  • the assessment would be in line with today’s interdisciplinary understanding of disability to which Italy has committed already 14 years ago when it ratified the UN Convention; and

  • the assessment would be harmonised with, and provide more valuable input into, any subsequent individual assessment of the actual support needs of people with disability.

The approach suggested for disability assessment is to combine medical and functioning information in some transparent form. While there are in principle many alternative methodological options for doing this, for Italy flagging the need for a second assessment seems to be the most meaningful and realistic way forward. This is so because the current process of civil invalidity assessment through which applicants are assigned an invalidity degree, or percentage, is strongly influenced and biased by the various thresholds in place for eligibility to various entitlements, benefits, and services. Therefore, while in theory people could be assigned any percentage, in practice most applicants for a civil invalidity assessment return with a degree close to, or at, one of the critical thresholds. Technically speaking, the current assessment returns ordinally scaled disability degrees determined by the existing thresholds rather than interval-scaled degrees that reflect the degree of the person’s impairment. The consequence of this is that quantitative approaches like scaling or averaging can generate undesirable results on both ends of the spectrum. People sitting just at a threshold would easily fall below the threshold and, thus, lose critical disability entitlements; people far away from a threshold might receive a significantly higher invalidity percentage but without any change in the type of service or benefit they are entitled to.

A related reason for the limited applicability in Italy especially of the averaging approach is the discretionary nature of Italy’s civil invalidity assessment. While the assessment is intrinsically medical in nature, assessors can take people’s actual situation into account if they wish: in a discretionary and untransparent way, they can increase the assigned invalidity percentage in line with any “perceived” functioning limitations – perceived, because this is done without any basis or tool to assess functioning. This problem is related to the problem that system thresholds seem to influence the assessment outcome. On the contrary, averaging would be a highly promising and adequate approach if it was used to average two independent pieces of information: the medical and the functional aspects of disability. Such a situation could be achieved also in Italy if information on these two aspects would be collected independently, and the medical part of the assessment would be performed in a standardised manner with methodological guidelines applicable across the entire country.

If Italy chooses to move on with the introduction of a flagging algorithm, two aspects have to be addressed: the weight given to functioning information relative to medical information, and the structure of the entire assessment process. The first question on the relevance attached to functioning, i.e. the WHODAS score, is equal to asking how many cases “should” be flagged. Even with strategy #11, the combined result of strategies #7-#10, only about 5.5% of all applicants would be considered for a second assessment – while the remaining 94.5% would not be affected by such a reform. That is a very low share which (i) does not do justice to the importance of people’s actual disability experience, (ii) hardly justifies a comprehensive reform, (iii) would likely fail in changing everyone’s mindset towards a modern view on disability and functioning and, eventually, (iv) would hardly affect the adequacy and effectiveness of disability supports.

It is, therefore, useful to think about ways to increase the number of flagged cases by not only questioning and thus reassessing extreme differences between the civil invalidity percentage and the WHODAS score but also smaller differences between the medical and the functional view. For this purpose, it is useful to use the finer grid of civil invalidity thresholds, which also distinguishes lower from higher moderate invalidity and lower from higher severe invalidity, thereby creating six different invalidity categories. Similarly, the following exercise splits the moderate and severe disability groups, as measured by the WHODAS score, into two subcategories each, thereby also creating six different disability categories. The following two supplementary strategies show the range of options which Italy has.

The first supplementary strategy selects all those cases for a second assessment for which the medically determined civil invalidity percentage on the six-category invalidity scale differs from the functionally determined disability score on the six-category WHODAS scale. Figure 5.8 shows the corresponding result: cases marked in red and green are those for which the WHODAS score would imply a reassessment, with a potential downshift for the cases marked in red and an upshift for those marked in green. About one in four of the total pilot sample falls in the same category under both scales (cases marked in grey) while all others would be considered for a reassessment, with two-thirds of the flagged cases potentially considered for a downshift to a lower invalidity rating and one-third for an upshift. Most potential downshifts concern people with a 100% civil invalidity rating (very severe) or a rating between 74% and 99% (higher severe). On the contrary, most potential upshifts are people with a higher moderate invalidity rating (46%-66%).

The second supplementary strategy is less strict and allows deviations in the two scales by one category and only selects those cases for a second assessment for which the medically determined civil invalidity percentage differs from the functionally determined disability score by at least two categories. Figure 5.9 shows the result of this middle strategy, again marking in red and green cases with a negative or positive discrepancy between the civil invalidity rating and the WHODAS score. In about 70% of the total pilot sample, the difference between the two scales is so small that the assigned civil invalidity rating would remain untouched, while 30% would be selected for a reassessment. Of those 30%, again, about two-thirds are candidates for a potential downshift and one-third candidates for a potential upshift. In this case, most potential downshifts concern people with a 100% civil invalidity rating (very severe) while potential upshifts concern people with a lower or higher moderate invalidity rating (34-45% or 46-66%).

There is no right or wrong in the choice of the flagging approach but, the higher the importance attached to the WHODAS score, the more cases will be considered for reassessment. While the two supplementary strategies are illustrative in nature, the 30% identified in the second supplementary strategy could be a meaningful middle way for the Italian Government to consider. The thresholds underlying the selection of cases for reassessment are somewhat arbitrary initially but would become more and more robust over time, as more and more data is being collected through the new assessment process.

The second aspect to consider for the introduction of a flagging algorithm is the structure of the assessment process, i.e. the question who is assessing and deciding at what stage of the process. In this context, the Italian system has a great starting advantage as the final disability rating is approved and assigned by INPS already today. This lends itself to a natural process. In a first step, medical information is assessed by the regional assessment committee, just like today, and functioning information by local social workers, as was done in the regional pilots. These two independently collected pieces of information – the person’s impairment score and the person’s WHODAS score – are forwarded to INPS (or any other supervisory authority) which evaluates and compares the results and decides in which cases a reassessment is needed. This echoes today’s process except that it would be done in a more transparent way and must include everyone for whom the medical and functional score deviate more than the legislation allows. If the two scores are close enough, the determination is essentially automatic and a decision on disability, by INPS, is issued. People for whom the two scores deviate are considered for a second assessment. In this case, medical assessors and social workers should sit together, examine the case, and make a new joint proposal to INPS. These could be done by the medical assessors and social workers responsible for the initial evaluation, or medical assessors and social workers from INPS (or the supervisory authority).

Of course, there are additional aspects to consider within the various components. For instance, better technical and methodological guidelines would be needed for assessing doctors on how to translate impairments (via body functions and body structures) into invalidity percentages, to eliminate the current level of discretion and ensure that people with the same type and level of impairment always receive the same invalidity percentage from the assessors. Similarly, one could consider moving away from the interval scale and instead only consider groups of impairment levels, such as those used in this report.

Italy certainly has the administrative capacity to implement such a change smoothly. Italy has a cadre of experienced social workers in both the health and the social sector who could be engaged in administering WHODAS. Most Italian regions also have an advanced information system that could easily accommodate the collection and use of the information on functioning, derived from a WHODAS questionnaire, in addition to the information on the impairment. If instead of a flagging approach, which will result in a second combined medical-functional assessment in selected cases, an averaging or a scaling approach would be chosen as the method for the future, the procedure would be even easier as much of the process could be automatic. Whichever the ultimate choice might be, the result is that information on functioning will be systematically included in disability assessment using a standardised approach, and the administrative process itself will become more rigorous, standardised, and objective.

In implementing change, the Italian Government will have to consider two additional, political aspects. First, any new method adopted should probably be applied to new applicants only, to make sure the change is accepted by the population. Across the OECD, only very few countries (in particular, the Netherlands and the United Kingdom) have chosen to reassess current beneficiaries according to any new, reformed assessment method. Most OECD countries would, in such situations, choose to grandfather existing recipients; generally, it is considered fairer to leave existing entitlements unchanged despite the apparent inequality such an approach creates between those who were assessed before and after reform.

Second, it will be important to anticipate and manage the outcome of any reform. Whatever approach is chosen, there will be some individuals who benefit from the reform and others who will lose entitlements when compared to the current situation. As one of the conditions for reform is cost neutrality, this issue is unavoidable. The importance given to the functioning component, relative to the medical information, will determine the size of the two groups. Instead, Italy could also choose to produce winners only and to use functioning information only to identify people for whom the current system fails to identify their needs adequately. Such an approach would ensure that no one is left behind but would not be cost neutral.

In conclusion, this evaluation shows that the concept of disability based on functioning (via WHODAS) and the concept of civil invalidity currently in use in Italy based on impairment are hugely different. This is not surprising because one approach tries to assess the level of activity and participation and the kind and nature of problems people have in a scientifically tested way, while the other limits itself to assessing the existence, or discretionarily perceived existence, of a medical condition. The considerable difference between the two concepts demonstrates the critical importance of the inclusion of functioning into Italy’s disability assessment. This will contribute to a better identification of the group of people needing support, better targeting of costly benefits and services, and a better link with regional and local needs assessments. The pilot has shown that Italy’s regions are very able to implement the necessary change.

References

[2] Bond, T. (2015), Applying the Rasch Model, Routledge, https://doi.org/10.4324/9781315814698.

[9] Boone, W., J. Staver and M. Yale (2014), Rasch Analysis in the Human Sciences, Springer Netherlands, Dordrecht, https://doi.org/10.1007/978-94-007-6857-4.

[14] Federici, S. et al. (2016), “World Health Organization disability assessment schedule 2.0: An international systematic review”, Disability and Rehabilitation, Vol. 39/23, pp. 2347-2380, https://doi.org/10.1080/09638288.2016.1223177.

[13] Ferrer, M. et al. (2019), “WHODAS 2.0-BO”, Revista de Saúde Pública, Vol. 53, p. 19, https://doi.org/10.11606/s1518-8787.2019053000586.

[8] Holland, P. and H. Wainer (1993), Differential item functioning, Lawrence Erlbaum Associates, Inc.

[12] Mayrink, J. et al. (2018), “Reference ranges of the WHO Disability Assessment Schedule (WHODAS 2.0) score and diagnostic validity of its 12-item version in identifying altered functioning in healthy postpartum women”, International Journal of Gynecology & Obstetrics, Vol. 141, pp. 48-54, https://doi.org/10.1002/ijgo.12466.

[4] Nunnally, J. and I. Bernstein (1994), Psychometric Theory (3rd edition), McGraw-Hill, New York.

[1] Rasch, G. (1960), Probabilistic Model for Some Intelligence and Achievement Tests, Danish Institute for Educational Research, Copenhagen.

[10] Smith, E. (2002), “Detecting and Evaluating the Impact of Multidimensionality Using Item Fit Statistics and Principal Component Analysis of Residuals”, Journal of Applied Measurement, Vol. 3/2, pp. 205-31.

[11] Smith, R. and C. Miao (1994), “Assessing Unidimensionality for Rasch Measurement”, in Objective Measurement: Theory into Practice: Volume 2, Greenwich, Ablex.

[7] Smith, R., R. Schumacker and M. Bush (1998), “Using item mean squares to evaluate fit to the Rasch model”, Journal of Outcome Measurement, Vol. 2/1, pp. 66-78.

[3] Tennant, A. and P. Conaghan (2007), “The Rasch measurement model in rheumatology: What is it and why use it? When should it be applied, and what should one look for in a Rasch paper?”, Arthritis Care & Research, Vol. 57/8, pp. 1358-1362, https://doi.org/10.1002/art.23108.

[6] Yen, W. (1993), “Scaling Performance Assessments: Strategies for Managing Local Item Dependence”, Journal of Educational Measurement, Vol. 30/3, pp. 187-213, https://doi.org/10.1111/j.1745-3984.1993.tb00423.x.

[5] Yen, W. (1984), “Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model”, Applied Psychological Measurement, Vol. 8/2, pp. 125-145, https://doi.org/10.1177/014662168400800201.

Legal and rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.