Assessing AI capabilities with education tests

Mila Staneva; Abel Baret; Ángel Aso-Mollar; Joseph Blass; Salvador Carrión Ponz; Vincent Conitzer; Ulises Cortes; Pradeep Dasigi; Angel de Paula; Carlos Galindo; Janice Gobert; Jordi Gonzàlez; Fredrik Heintz; Jim Hendler; Daniel Hendrycks; Lawrence Hunter; Juan Izquierdo-Domenech; Maria Juarez; Aina Juraco Frias; Aviv Keren; Rik Koncel-Kedziorski; David Leake; Bao Sheng Loe; Fernando Martinez-Plumed; Aqueasha Martin-Hammond; Cynthia Matuszek; Antoni Mestre Gascón; Jose Andres Moreno; Constantine Nakos; Taylor Olson; Carolyn Rose; Areg Mikael Sarvazyan; Brian Scassellati; Wout Schellaert; Claes Strannegård; Neset Tan; Tadahiro Taniguchi; Karina Vold; Michael Wooldridge

doi:https://doi.org/10.1787/bbdeb1e0-en

AI and the Future of Skills, Volume 2

Methods for Evaluating AI Capabilities

As artificial intelligence (AI) expands its scope of applications across society, understanding its impact becomes increasingly critical. The OECD's AI and the Future of Skills (AIFS) project is developing a comprehensive framework for regularly measuring AI capabilities and comparing them to human skills. The resulting AI indicators should help policymakers anticipate AI’s impacts on education and work.

This volume describes the second phase of the project: exploring three different approaches to assessing AI. First, the project explored the use of education tests for the assessment by asking computer experts to evaluate AI’s performance on OECD’s tests in reading, mathematics and science. Second, the project extended the rating of AI capabilities to tests used to certify workers for occupations. These tests present complex practical tasks and are potentially useful for understanding the application of AI in the workplace. Third, the project explored measures from direct AI evaluations. It commissioned experts to develop methods for selecting high-quality direct measures, categorising them according to AI capabilities and systematising them into single indicators. The report discusses the advantages and challenges in using these approaches and describes how they will be integrated into developing indicators of AI capabilities.

English

Related Content:
- AI and the Future of Skills, Volume 1

https://doi.org/10.1787/a9fe53cb-en

Chapter

Assessing AI capabilities with education tests

This chapter introduces three exploratory studies that assessed the capabilities of artificial intelligence (AI) through standardised education tests designed for humans. The first two studies, conducted in 2016 and 2021/22, asked experts to evaluate AI’s performance on the literacy and numeracy tests of the OECD’s Survey of Adult Skills (PIAAC). The third study collected expert judgements of whether AI can solve science questions from the OECD's Programme for International Student Assessment (PISA). The studies aimed to refine the assessment framework for eliciting expert knowledge on AI using established educational assessments. They explored different test formats, response methodologies and rating instructions, along with two distinct assessment approaches. A “behavioural approach” used in the PIAAC studies emphasised smaller expert groups engaging in discussions, and a "mathematical approach" adopted in the PISA study relied more heavily on quantitative data from a larger expert pool. This chapter presents the results of the studies and discusses the advantages and disadvantages of their methodological approaches.

Less

English

More On
- Education

Click to access:
Click to download PDF - 988.73KB
PDF

You have access to all online formats

EMAIL THIS PAGE

Author(s)
Mila Stanevaⁱ, Abel Baretⁱ, Ángel Aso-Mollarⁱⁱ, Joseph Blassⁱⁱⁱ, Salvador Carrión Ponzⁱⁱ, Vincent Conitzer^iv, Ulises Cortes^v, Pradeep Dasigi^vi, Angel de Paulaⁱⁱ, Carlos Galindoⁱⁱ, Janice Gobert^vii, Jordi Gonzàlez^viii, Fredrik Heintz^ix, Jim Hendler^x, Daniel Hendrycks^xi, Lawrence Hunter^xii, Juan Izquierdo-Domenechⁱⁱ, Maria Juarezⁱⁱ, Aina Juraco Friasⁱⁱ, Aviv Keren^xiii, Rik Koncel-Kedziorski, David Leake^xiv, Bao Sheng Loe^xv, Fernando Martinez-Plumedⁱⁱ, Aqueasha Martin-Hammond^xiv, Cynthia Matuszek^xvi, Antoni Mestre Gascónⁱⁱ, Jose Andres Morenoⁱⁱ, Constantine Nakosⁱⁱⁱ, Taylor Olsonⁱⁱⁱ, Carolyn Rose^iv, Areg Mikael Sarvazyanⁱⁱ, Brian Scassellati^xvii, Wout Schellaertⁱⁱ, Claes Strannegård^xviii, Neset Tan^xix, Tadahiro Taniguchi^xx, Karina Vold^xxi and Michael Wooldridge^xxii ⁱOECD
ⁱⁱUniversitat Politècnica de València
ⁱⁱⁱNorthwestern University
^ivCarnegie Mellon University
^vUniversitat Politècnica de Catalunya
^viAllen Institute for AI
^viiRutgers University
^viiiUniversitat Autònoma de Barcelona
^ixUniversity of Linköping
^xRensselaer Polytechnic Institute
^xiCenter for AI Safety
^xiiUniversity of Colorado Anschutz Medical Campus
^xiiiEdgify
^xivIndiana University
^xvUniversity of Cambridge
^xviUniversity of Maryland
^xviiYale University
^xviiiChalmers University of Technology
^xixUniversity of Auckland
^xxPanasonic
^xxiUniversity of Toronto
^xxiiUniversity of Oxford