17. Conclusions and prospects

Dirk Van Damme
OECD
France

This volume brings together assessment data and analyses of academic skills such as critical thinking from institutions in six different higher education systems, using the Council for Aid to Education’s (CAE) CLA+ assessment instrument and its international variants. This is the first internationally comparative endeavour to assess generic, 21st-century academic skills across institutions and systems. The OECD’s AHELO Feasibility Study (2008-13) proved that such comparative assessment is feasible but did not publish the assessment data from the study and failed to transform into a Main Study. Over the past 10 years since the end of the AHELO Feasibility Study, the assessment of higher education learning outcomes has become an important ambition for policy makers, researchers and higher education leaders. Various projects have seen the light of day, often at national level and using very different approaches and assessment instruments. This is not necessarily a bad thing, but rather a sign of the collective learning going on in higher education systems.

There is great demand for valid and reliable internationally comparative assessments of skills that matter for the 21st-century workplace. The global market place clearly values generic skills such as critical thinking and problem solving. Global employers no longer automatically trust that higher education degrees and qualifications reliably signal these skills and have increasingly turned to their own assessment practices (see Chapter 1 in this volume). The almost complete lack of reliable comparative metrics of what students learn in higher education institutions could, potentially, become a major systemic risk for the sector. International rankings of higher education institutions are used as a proxy for the quality of institutions and the credentials they deliver but contain little measure of the quality of teaching and learning. Yet, in the absence of any better metric, the heavy usage of such rankings indicates the clear need for reliable data on the skills graduates need to compete in the labour market.

No one has yet developed reliable comparative metrics of learning and skills development in higher education. But the present volume shows that we are making progress. In this closing chapter of the book, we will summarise the main results of our collaborative enterprise, reflect on the lessons learnt and indicate some prospects.

Perhaps the first conclusion is the most important one: An international, comparative assessment of one of the most relevant learning outcomes of higher education is feasible. The chapters in the first part of this volume discuss in detail the construct validity, reliability and cross-cultural validity issues associated with an international assessment of higher education learning outcomes in the domain of generic skills. Their conclusion is clear: the CLA+ International instrument has potential as a valid and reliable assessment instrument. Of course, it is not the only assessment tool available on the market, and it only covers a specific segment of relevant learning outcomes and skills, but it has shown to function well in different contexts, across different systems and for various groups of students.

Moreover, as shown in Chapter 7, the CLA+ assessment has empirically confirmed predictive validity on career and labour market outcomes later in life. Analysis of the US assessment data linked to surveys administered to employers and career advisors demonstrate that the critical thinking skills assessed by the CLA+ instrument are predictive of future educational success, career development and labour market outcomes. Assessments of relevant employability skills such as critical thinking provide more powerful indicators of human capital than measures of foundation skills such as literacy and numeracy.

That an international assessment is feasible was already the conclusion of the AHELO Feasibility Study, completed in 2013. But the CLA+ has further improved by learning from its implementation, analytical research on the data gathered, and the international collaboration of which this report is the result (see Chapter 3 in this volume). Significant progress has been made on, among others, item development and, more recently, computer-based testing and computer-assisted scoring. This has greatly contributed to the useability and cost-efficiency of the assessment instrument, the processing of the data and the reporting of the analysis.

The experiences in the systems reported on in Part III of this volume and the methodological robustness of the instrument provide convincing proof that a wider implementation of the assessment in more institutions and systems is possible. A closer look into the substantive findings from the assessment in six systems in the next section will suggest that it is also worth doing.

Part II of this report analyses the data of the assessment of over 120 000 students included in the aggregated database across institutions and systems. Of these, close to 100 000 were students in the United States. These students, almost equally split between those entering and exiting a first-degree programme, were assessed with equivalent versions of the CLA+ instrument over the period between 2015 and 2020. With the exception of Italy, all systems carried out multiple administrations of the assessment.

Across the sample, students entering a higher education programme on average performed at ‘developing’ mastery level of the test. Exiting students on average performed at ‘proficient’ mastery level. The shift is relatively small (d = .10) but significant. The total distribution remains more or less the same across both subsamples, suggesting that the entire distribution moved to a higher score. The distribution is quite large, with on average 21% of students (average of country averages) performing at the lowest ‘emerging’ performance level. Thus, across countries, 20% of students performed at the lowest mastery level while 15% of students performed at ‘accomplished’ and ‘advanced’ mastery levels.

These general results across the six systems can be interpreted in different ways. Overall, it is encouraging to see that during their time in a higher education programme, students improved their critical thinking skills. However, given the importance that most higher education programmes attach to promoting critical thinking skills, the learning gain is smaller than could be expected. If universities really want to foster 21st-century skills such as critical thinking, they need to upscale their efforts. While universities produce graduates who can be considered, on average, as proficient in critical thinking, the distribution of achievement is quite wide, with one-fifth of students performing at the lowest level. With half of exiting students performing at the two lowest levels, it is difficult to claim that a university qualification reliably signals a level of critical thinking skills expected by the global market place.

The analysis cannot positively confirm that the learning gain is caused by the teaching and learning experience within university programmes. It is possible that, for example, selection effects (selective drop-out), general maturing of the student population or effects of learning outside university contribute to the average learning gain. However, the fact that the distribution in achievement remains more or less the same from entering to exiting students shows that the entire student population moves upwards, suggesting that the learning gain is caused by a common, shared learning experience.

International large-scale assessments of learning outcomes such as PISA, PIAAC, Trends in International Mathematics and Science Study (TIMSS) and others show that variations in learning outcomes are quite heavily influenced by students’ background such as gender, language, family background, migration status and other. It is interesting to examine whether this also is the case for the assessment of critical thinking learning outcomes at university.

The impact of language (whether students’ primary language is different from the instruction/test language or not) shows to be statistically significant but rather small. Both entering and exiting students with a different mother tongue than the language of instruction perform slightly lower than other students except for the US sample of entering students. But, given the importance of linguistic proficiency for completing the test, these results are actually quite positive. Language barriers do not meaningfully hinder critical thinking.

This analytical finding from the entire database conceals contradictory findings in individual countries. With regard to language, it is interesting to note that in the United States sample, entering students with a different native language than the language of instruction performed better than students whose native language was the same as the language of instruction but that this advantage was reversed when they left the institution (see Chapter 6). In the England sample, both entering and exiting students whose native language was English outperformed students with a different native language (see Chapter 13). There are also interesting differences in the impact of language on the two different components of the test, the Performance Test (PT) and the Selected Response Questions (SRQ), which mobilise different language proficiency skills.

Gender does not seem to have a huge impact. There are some minor, statistically significant but, overall, small, differences between male and female students but no clear general pattern. However, as the country chapters in Part III illustrate, at a national level, gender can play a role. This is the case for Finland, where gendered patterns in scores by field of study can be identified (see Chapter 12).

The impact of family background has been examined through the parents’ educational attainment variable. In contrast to language and gender, parents’ educational attainment did show to have an impact on students’ critical thinking performance, both among entering and exiting students and for both the international and the US sample. This points to a persistent effect of students’ social-economic-cultural status on their educational achievement, even at this stage of their educational trajectory. Students from more disadvantaged backgrounds are more disadvantaged in critical thinking.

The question of whether parents’ education also influences the learning gain between entering and exiting a university programme could not be answered in a conclusive way. The data suggest that students with higher parental educational status achieved a slightly higher learning gain than students with lower parental educational status. However, the relative impact of selection/attrition versus education remains unclear. More sophisticated research designs, preferably longitudinal, would be needed to answer that question.

Some of the country chapters in Part III explore other relevant background variables and their relationship with CLA+ assessment data. For example, in Mexico (Chapter 14) large differences were noted between students in campuses in metropolitan areas and students in campuses in remote, rural areas. In a country like Mexico, geography plays an important role through its association with economic development, social prosperity and levels of poverty and exclusion.

Analyses reported in Chapter 8 show that there are significant differences in assessment of critical thinking skills between non-US students in programmes in different fields of study. On average across countries with relevant data, students in business and agriculture were found to have relatively low scores while students in the humanities, sciences and social sciences were found to have relatively high scores. This pattern holds for both entering and exiting students, suggesting a combination or interaction between selection and education effects. However, the highest learning gain achieved between entering and exiting university was found with students in health and welfare.

A more or less similar pattern was found for US students but with a slightly different ranking. Students in science and engineering were found to have the highest scores, both when entering and exiting, followed by social sciences and humanities.

Chapter 8 also reports on an interesting analysis of differences in CLA+ scores for different instructional formats for exiting non-US students. Although differences are relatively small, seminars, lectures and science laboratories are associated with the highest scores (with averages in the ‘proficient’ mastery level) whereas service learning and field work is associated with low scores. The positive result for lectures and negative result for service learning and field work contradict popular opinions on higher education pedagogy, which favour activating instructional formats. Critical thinking seems to flourish in instruction that requires deep engagement with content, as is the case for lectures, laboratories and seminars.

Although the present study is not based on representative sampling within countries (except for Finland, which administered a system-wide assessment but with some institutions opting out), it is interesting to see whether there are meaningful differences in CLA+ results between countries. Chapter 9 analyses country-level differences in CLA+ scores for samples in five countries: the United States, the United Kingdom, Finland, Chile and Mexico. In the reporting of the results, countries were anonymised, as agreed with the country project managers in the study. Given limitations in the sampling, the impossibility of considering national data as reliable measures at the country level, and the agreement with participating institutions that they have full ownership of the assessment data, it was not possible nor desirable to rank countries. Still, some very interesting observations can be made. The comparison between countries is useful in exploring the importance of the national level as a relevant variable.

The data reported in Chapter 9 show clear variations in mastery levels between the five countries. Already, when students enter university, they exhibit very different proficiency levels in critical thinking. In countries A and D, half or more of the entering students scored at ‘proficient’, ‘accomplished’ and ‘advanced’ levels whereas in other countries half or more of entering students scored at the two lowest mastery levels. Large variation was also shown for exiting students. In country D, 70% of exiting students scored at the ‘proficient’ level or higher whereas in country C only 45% of exiting students scored at those levels. The average learning gain achieved while students were in university was unrelated to their mastery level when they started. Students in country C gained very little in proficiency even if they had started at a low level. Students in country D started at a much higher level but advanced a great deal before exiting while students in country E started at a much lower level but gained nearly as much. The results in country C are clearly disappointing. Other systems show greater levels of progress but country D makes clear that even with already high levels of critical thinking among entering students there is much opportunity to make significant progress during university.

The variability of CLA+ scores across country samples suggests that there are indeed significant differences between countries in the capacity of their education systems prior to and within higher education to develop critical thinking skills. As shown in the country-specific chapters in Part III of this volume, education systems differ in the policies, educational objectives and cultures within institutions. Still, in an increasingly global context for higher education institutions and, especially, for the employability and social participation of graduates, higher education systems that are better equipped to foster critical thinking skills will find themselves in a better place in the 21st-century environment. Given the changes in skill demand that are impacting all countries, though this varies depending on where countries sit in the global value chain, all countries should enable higher education institutions to perform better in fostering critical thinking.

That said, the variability in the country-level data is not high. International rankings have created the perception that quality differences between higher education systems are huge but the country-level data in this report contradict this. There are interesting country-level differences in the assessment of generic 21st-century skills but they are small and certainly do not mirror the steep hierarchical perception suggested by international rankings. In any case, with the exception of the United States and Finland, larger and more representative samples are needed for other countries before anything meaningful can be concluded about performance differences between countries.

The chapters on individual countries or systems participating in this project in Part III clearly indicate the wide variability in decision-making processes, implementation of the assessment, its outcomes and their policy relevance. Nonetheless, despite different trajectories, all these systems want to better understand the role of generic, 21st-century skills development in higher education programmes.

The CLA+ assessment instrument was developed in the United States, where it has been implemented in a wide range of institutions and has become part of the assessment infrastructure for higher education (Chapter 10). It constitutes a response to the demand of the Spellings Commission (2006) for more evidence-based accountability of institutions through the assessment of students’ generic skills. Implementation of the assessment over the past 15 years has generated a wealth of data, fuelling interesting analyses and inspiring policy debates within institutions and at state and federal levels.

Italy was the first country to implement the CLA+ outside the United States. It was carried out as part of an initiative by the national evaluation agency, ANVUR, to assess student learning in Italian universities following the country’s participation in the OECD AHELO Feasibility Study (Chapter 11). In 2013, the CLA+ assessment was administered to samples of students from 12 Italian universities followed by a second administration in 2015 in another 26 universities. The implementation in Italy was not without difficulties, especially with regard to student selectivity and motivation, and scoring. However, the experiences with the Italian implementation were very instructive for other systems in the following years. After 2016, ANVUR turned away from the assessment of generic, 21st-century skills to more discipline-focused testing.

Finland was the most recent country to implement the CLA+ instrument (Chapter 12). Like Italy, Finland was a participant in the AHELO Feasibility Study and had been looking for opportunities to develop its own implementation. Finland was the first country to implement the CLA+ at a system-wide scale with representative sampling of students in participating institutions. The Finnish experience thus provides the richest experience outside the United States in terms of the administration of the CLA+, analysis of data and relevance for policy development. To date, the Finnish project provides the most extensive data of CLA+ implementation outside the United States.

In England, the government asked the Higher Education Funding Council for England (HEFCE) (2015) to assess learning gain in higher education institutions. It would be an opportunity to start a research initiative in two newer or post-1992 universities to assess the development of generic 21st-century skills using the CLA+ instrument (Chapter 13). The Teaching Excellence Framework (TEF) (2016-17) provided space and funding for work on learning gain. Interestingly, the experiment in England included a longitudinal design, which proved ambitious. The project in England also demonstrated the potential of the assessment as a diagnostic tool for institutional improvement as well as accountability-focused measure.

Like Italy and Finland, Mexico was an enthusiastic supporter of the OECD’s AHELO Feasibility Study. There, a large public university pioneered the CLA+ instrument in response to governmental initiatives to improve the quality of university teaching and learning (Chapter 14). Performance-based testing was seen as a powerful tool to assess the generic skills needed for workplace success. In 2017-18 three testing sessions were administered for over 8 500 students. The project not only generated very interesting data and analyses but stimulated the institutional drive towards improving teaching and learning, and tackling huge disparities within student performance.

Outreach activities in Latin America, starting in 2017, provided the necessary groundwork for the implementation of CLA+ in some countries on the continent (Chapter 15). Up to 2020, four private universities in Chile started a project to use the CLA+ with support from the government. The case studies not only provided very interesting data but an important institutional learning opportunity. The implementation of the assessment also stimulated the interest in critical thinking as a learning objective in the curriculum.

Finally, while no actual testing has taken place in Australia and New Zealand (Chapter 16), policy developments have taken place that could eventually lead to the implementation of the CLA+ assessment. It is notable that interest is mainly coming from the vocational post-secondary sector rather than from universities. The chapter describes the growing interest in generic, 21st-century skills such as critical thinking for employability and citizenship, and how this is driving policy debates on the implementation of CLA+ as a measurement tool.

Over the past years, discussions with many more countries than the ones this book reports have taken place. Many systems see the relevance of assessing generic skills like critical thinking. But many face barriers. Resistance from institutions, faculties and staff; implementation; and funding problems are just some of the issues that must be confronted to take the necessary steps forward.

The experiences in individual countries illustrate that a shared interest in the importance of generic, 21st-century skills for employability and citizenship drives decisions to implement the CLA+ assessment. In all countries, strong political interest, often triggered by external stakeholders such as the business community, has been a necessary condition for moving ahead. When there is clear political consensus in favour of generic skills and a supportive political context for institutions, things start to move.

A second lesson learnt is the power of assessment to drive the reform agenda in higher education. The saying that only what is assessed matters, is true. Without assessment, debates on the importance of generic skills risk becoming partisan and divisive. With a credible evidence base, even if imperfect, the debate is fuelled with data and becomes realistic.

A third lesson is about the importance of an inclusive approach. The institutional context in higher education is extremely important, implying that no government – let alone an international organisation – can impose an assessment on institutions. In all countries, institutional consent has proven to be a critically important condition for success. The failure of the AHELO project to move to a Main Study was probably due to the fact that conventional decision making at governmental level, without duly organised processes of discussion and negotiation with institutions, is doomed to fail in a higher education environment. Several country reports in Part III of this volume also point to the importance of motivating staff to positively support the assessment.

The fourth lesson learnt is about students. In a higher education environment, it is nearly impossible to force students to sit a test if they don’t see the added-value for themselves. Several chapters in this volume explore the topic of student motivation and engagement. Suboptimal student motivation not only negatively affects participation rates but the quality of the assessment results too. Students are willing to sit the test and do their best if they view it as a reliable tool for their own interests. From this perspective, it is interesting to see the development of the CLA+ assessment towards rewarding successful students with digital badges and credentials. Prospective employers can access these to get an idea of the person’s generic skills. This is a very promising development.

It would be premature to call the present volume and the experiences in the participating countries a sufficient basis for moving to a large-scale assessment of generic, 21st-century skills in higher education. However, this volume illustrates the power of assessment to drive the policy debate on the importance of generic skills such as critical thinking. An important opportunity is opening for governments and institutions to develop initiatives to assess critical thinking, using the CLA+ instrument or others. Such initiatives will be powerful collective learning opportunities from which the entire global higher education community can benefit.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2022

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.