Take a moment to reflect on the AI advancements of the past two years as a whole. The pace at which AI is nearing human capabilities in various domains is astounding, calling for new benchmarks to assess its abilities.
The latest edition of the AI Index report from Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) is now out. This year’s report, more extensive than ever, provides a broad analysis of AI’s integration into our lives, from industry usage trends to international concerns over job displacement due to AI technologies. A key finding of the report is the level at which AI competes with human performance.
For those who haven’t been closely tracking AI’s progress, the strides it has made are quite startling. Starting with outperforming humans in image classification in 2015, AI quickly moved on to surpass us in basic reading comprehension by 2017, visual reasoning by 2020, and natural language inference by 2021. The rapid advancement of AI has rendered many older benchmarks inadequate. This has led to a rush among researchers to create new, tougher benchmarks that not only test AI’s competencies but also distinguish between what AI can do and where humans still excel.
Despite using these perhaps outdated benchmarks, the trends outlined in the report are unequivocal: The steep inclines in recent performance trajectories indicate just how quickly AI is evolving. Consider that these technologies are still in their infancy. According to the 2023 AI Index report, AI faces challenges with complex cognitive tasks such as solving advanced mathematics problems and visual commonsense reasoning. Yet, calling these challenges ‘struggles’ might not be entirely accurate given the significant improvements noted.
On the MATH dataset, consisting of 12,500 high-level math problems, AI’s performance has surged. From a mere 6.9% solution rate in 2021, a GPT-4-based model managed to solve 84.3% of these problems by 2023, approaching the human baseline of 90%. Consider visual commonsense reasoning (VCR). This goes beyond mere object recognition, testing how AI can apply everyday knowledge in visual scenarios to predict outcomes. From 2022 to 2023, AI’s VCR scores rose by 7.93% to reach 81.60, nearing the human baseline of 85.
Rewind just five years, and the idea of expecting a computer to “understand” a visual context in this manner would have seemed far-fetched.
AI has also made significant inroads in generating written content for various professions, though large language models (LLMs) often produce what is euphemistically termed ‘hallucinations’—misleading or incorrect information presented as facts. This issue was highlighted last year when lawyer Steven Schwartz, who used ChatGPT for legal research without verifying its accuracy, was fined $5,000 by a judge for submitting a court document containing fabricated legal cases.
To assess the prevalence of hallucinations in LLMs, the HaluEval benchmark was employed, revealing that this remains a notable challenge.
Moreover, in assessing LLMs’ ability to provide truthful information, the TruthfulQA benchmark used questions on topics such as health, law, finance, and politics to probe common misconceptions. Here, GPT-4’s performance in early 2024 scored 0.59, almost tripling the score of earlier models like GPT-2 tested in 2021, indicating progressive improvement in accuracy.
In the realm of AI-generated images, consider Midjourney’s depiction of Harry Potter over 22 months, reflecting rapid advances in text-to-image generation.In the Holistic Evaluation of Text-to-Image Models (HEIM), various LLMs were assessed for their ability to generate images aligning with text prompts. Among these, OpenAI’s DALL-E 2 was notable for its alignment of images to text, while the Stable Diffusion-based Dreamlike Photoreal model stood out for image quality, aesthetics, and originality.
Last year—2023—was a monumental year for AI, and 2024 has only added to the excitement with groundbreaking developments like Suno, Sora, Google Genie, Claude 3, Channel 1, and Devin, along with the looming potential of GPT-5.
AI’s trajectory of rapid development is not slowing down, making it an ever more integral part of our technological landscape. Stay tuned for more insights as we delve further into AI’s impact on global perceptions regarding its safety, trustworthiness, and ethics in the second instalment of our coverage on this topic.