The effect of incentives on the final response rate
Self-reported motivation to complete the survey
AI numeracy performance in 2016 and 2021, by question difficulty
Predicted AI performance on PISA science questions in 2022 by core experts and larger expert group, by question difficulty
Divergence in experts’ evaluations in different assessments
Share of questions that receive more than 20% of uncertain ratings in different assessments
Experts' ratings of AI and GPT-3.5 performance on PISA science questions
AI and robotics performance on entire task, by task format
Distribution of expert ratings of AI and robotics performance on entire task
Average AI and robotics performance, by expert and expertise
AI and robotics performance in broad capability domains, by task and expertise
AI capability expert ratings and their comparison to the ratings of the first study
AI capability expert ratings, by task
Rater agreement across all facets
Rater value selection on validity facets
Raters value selection on consistency facets
Raters value selection on fairness facets
Rater values selection on validity facets for eight evaluation campaigns by NIST and LNE
Rater values selection on consistency facets for eight evaluation campaigns by NIST and LNE
Rater values selection on fairness facets for eight evaluation campaigns by NIST and LNE