Grok, the AI language model introduced by Elon Musk’s xAI, recently entered widespread usage, prompting the detection of certain glitches. A tweet from security tester Jax Winterbourne displayed a Grok response refusing a query, citing, “I’m afraid I cannot fulfil that request, as it goes against OpenAI’s use case policy.” This raised eyebrows online, as Grok is not developed by OpenAI, the creator of ChatGPT, which Grok is positioned to rival.
Notably, xAI representatives did not dispute this behaviour in their AI model. In response, xAI employee Igor Babuschkin explained, “The issue here is that the web is full of ChatGPT outputs, so we accidentally picked up some of them when we trained Grok on a large amount of web data. This was a huge surprise to us when we first noticed it. For what it’s worth, the issue is scarce, and now that we’re aware of it, we’ll make sure that future versions of Grok don’t have this problem. Don’t worry, no OpenAI code was used to make Grok.”
However, Winterbourne expressed scepticism, stating, “Thanks for the response. I will say it’s not very rare and occurs frequently when involving code creation. Nonetheless, I’ll let people who specialize in LLM and AI weigh in on this further. I’m merely an observer.“
Despite xAI’s explanation, some experts find it improbable because extensive language models typically do not reproduce their training data verbatim. AI researcher Simon Willison remarked, “I’m a bit suspicious of the claim that Grok picked this up just because the Internet is full of ChatGPT content.” He suggested that Grok may have been instruction-tuned on datasets involving ChatGPT output intentionally.
As OpenAI’s large language models (LLMs) have grown more powerful, fine-tuning AI models with synthetic data, especially derived from other language models, has become common. In March, Stanford University researchers used OpenAI’s GPT-3 model outputs to fine-tune Meta’s LLaMA 7B model for instruction-following, showcasing the prevalence of such practices in the AI community.
While borrowing outputs from others is widespread in machine learning, this episode intensified the rivalry between OpenAI and xAI, rooted in Elon Musk’s past criticisms of OpenAI. As the news circulated about Grok potentially borrowing from OpenAI, the official ChatGPT account commented, “We have a lot in common” and quoted Winterbourne’s X post. In response, Musk wrote, “Well, son, since you scraped all the data from this platform for your training, you ought to know.”