1887

AI and the Future of Skills, Volume 2

Methods for Evaluating AI Capabilities

image of AI and the Future of Skills, Volume 2

As artificial intelligence (AI) expands its scope of applications across society, understanding its impact becomes increasingly critical. The OECD's AI and the Future of Skills (AIFS) project is developing a comprehensive framework for regularly measuring AI capabilities and comparing them to human skills. The resulting AI indicators should help policymakers anticipate AI’s impacts on education and work.

This volume describes the second phase of the project: exploring three different approaches to assessing AI. First, the project explored the use of education tests for the assessment by asking computer experts to evaluate AI’s performance on OECD’s tests in reading, mathematics and science. Second, the project extended the rating of AI capabilities to tests used to certify workers for occupations. These tests present complex practical tasks and are potentially useful for understanding the application of AI in the workplace. Third, the project explored measures from direct AI evaluations. It commissioned experts to develop methods for selecting high-quality direct measures, categorising them according to AI capabilities and systematising them into single indicators. The report discusses the advantages and challenges in using these approaches and describes how they will be integrated into developing indicators of AI capabilities.

English

AI direct tests: LNE and NIST evaluations

Artificial intelligence (AI) has developed significantly in recent years. Its increased application in the industrial and domestic worlds raises questions about how it complements human intelligence. It seems only possible to evaluate this complementarity task by task or capability by capability. This chapter proposes a method and criteria (nature of the evaluation task, application area, level of difficulty, etc.) for systematising tasks on which AI and robotics systems have been evaluated in the past. This will allow the extraction of areas already covered and those yet to be evaluated. This method is applied to evaluation campaigns by the National Institute of Standards and Technology in the United States and the French Laboratoire National de Métrologie et d’Essais over the last decades. The paper concludes with a proposal for next steps to complete the mapping based on expert judgement.

English

Graphs

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error