On evaluating artificial intelligence systems: Competitions and benchmarks

Anthony G. Cohn

doi:https://doi.org/10.1787/d755c6d6-en

AI and the Future of Skills, Volume 1

Capabilities and Assessments

Artificial intelligence (AI) and robotics are major breakthrough technologies that are transforming the economy and society. The OECD’s Artificial Intelligence and the Future of Skills (AIFS) project is developing a programme to assess the capabilities of AI and robotics, and their impact on education and work.

This volume reports on the first step of the project: identifying which capabilities to assess and which tests to use in the assessment. It builds on an online expert workshop that explored this question from the perspectives of both psychology and computer science. The volume consists of expert contributions that review skills taxonomies and tests in different domains of psychology, and efforts in computer science to assess AI and robotics. It provides extensive discussion on the strengths and weaknesses of different approaches, and outlines directions for the project. The report can therefore be a resource for the research community of multiple fields and policy makers who wish to obtain deeper insight into the complexity of machine capabilities.

English

Related Content:
- AI and the Future of Skills, Volume 2
- Is Education Losing the Race with Technology?

https://doi.org/10.1787/5ee71f34-en

Chapter

On evaluating artificial intelligence systems: Competitions and benchmarks

This chapter discusses some approaches and methods used by the artificial intelligence (AI) community to measure and evaluate AI systems. It looks at the evolution of competitions, giving special attention to the Turing Test and the Winograd Schema Challenge. It also looks at the fascination of researchers for testing AI through games such as chess and Go. Several tests for measuring intelligence proposed for AI systems are examined, as well as the role of benchmark datasets in evaluating AI systems. The chapter ends with a discussion of the benefits and limitations of four approaches: custom dataset, benchmarks, competition and qualitative evaluation.

Less

English

More On
- Education

Click to access:
Click to download PDF - 596.68KB
PDF

This is a required field

Please enter a valid email address

Approval was a Success

Invalid data

An Error Occurred

Approval was partially successful, following selected items could not be processed due to error