Assessing AI capabilities on occupational tests

Margarita Kalamova

doi:https://doi.org/10.1787/4bd0d136-en

AI and the Future of Skills, Volume 2

Methods for Evaluating AI Capabilities

As artificial intelligence (AI) expands its scope of applications across society, understanding its impact becomes increasingly critical. The OECD's AI and the Future of Skills (AIFS) project is developing a comprehensive framework for regularly measuring AI capabilities and comparing them to human skills. The resulting AI indicators should help policymakers anticipate AI’s impacts on education and work.

This volume describes the second phase of the project: exploring three different approaches to assessing AI. First, the project explored the use of education tests for the assessment by asking computer experts to evaluate AI’s performance on OECD’s tests in reading, mathematics and science. Second, the project extended the rating of AI capabilities to tests used to certify workers for occupations. These tests present complex practical tasks and are potentially useful for understanding the application of AI in the workplace. Third, the project explored measures from direct AI evaluations. It commissioned experts to develop methods for selecting high-quality direct measures, categorising them according to AI capabilities and systematising them into single indicators. The report discusses the advantages and challenges in using these approaches and describes how they will be integrated into developing indicators of AI capabilities.

English

Related Content:
- AI and the Future of Skills, Volume 1

https://doi.org/10.1787/a9fe53cb-en

Chapter

Assessing AI capabilities on occupational tests

This chapter evaluates the capabilities of artificial intelligence (AI) in complex occupational tasks typical of real-world job settings. Using tasks from certification and licensing performance tests, the study aims to provide a more tangible assessment base than abstract constructs such as literacy and numeracy. Despite the clarity they offer, occupational tasks, given their complexity, pose methodological challenges in gathering expert judgements on AI’s proficiency. Two pilot studies, containing 13 tasks across six occupations, revealed AI’s aptitude in basic reasoning and language processing and limitations in nuanced and physically intricate activities. Expert feedback highlighted ambiguities in task descriptions and the difficulties of comparing AI and human skills. This chapter outlines the methodology, findings and implications of these assessments.

Less

English