A recent study conducted by researchers at the University of Reading has uncovered a striking finding: artificial intelligence (AI) can outperform real students in university exams. The research, published in the journal Plos One, involved creating 33 fictitious students and using the AI tool ChatGPT to generate answers for undergraduate psychology exams. The results showed that the AI-generated answers were, on average, half a grade boundary higher than those of actual students.
In the study, the AI-generated essays and exam answers were submitted without the knowledge of the markers. Remarkably, 94% of these AI submissions did not raise any concerns, suggesting they were nearly undetectable. This low detection rate indicates that students could potentially use AI to cheat and achieve higher grades than their peers who do not use such tools.
The study’s authors, Associate Prof. Peter Scarfe and Prof. Etienne Roesch, emphasized the importance of these findings for educators globally. “Our research shows it is of international importance to understand how AI will affect the integrity of educational assessments,” said Dr. Scarfe. He noted that while many institutions have moved away from traditional exams to make assessments more inclusive, the rise of AI presents new challenges.
“We won’t necessarily go back fully to handwritten exams, but the global education sector will need to evolve in the face of AI,” Dr. Scarfe added.
The study included fake exam answers and essays for first-, second-, and third-year modules. The AI-generated answers outperformed those of real undergraduates in the first two years. However, in the third-year exams, human students scored better. This outcome aligns with the researchers’ observation that current AI struggles with more abstract reasoning tasks.
The study is the largest and most robust blind study of its kind to date, highlighting significant concerns about the influence of AI in education. Some institutions are already responding to these challenges. For instance, Glasgow University has reintroduced in-person exams for one course in response to the growing use of AI by students.
Earlier this year, a report by The Guardian found that most undergraduates used AI programs to assist with their essays, although only 5% admitted to submitting unedited AI-generated text.
The findings from the University of Reading study serve as a wake-up call for educators worldwide. As AI technology continues to advance, it is crucial for the education sector to adapt and find new ways to maintain the integrity of assessments. The challenge will be to balance the benefits of AI in enhancing learning with the need to ensure fair and accurate evaluation of student performance.