scorecardThe AI behind ChatGPT really does seem to be getting dumber — but no one can quite figure out why
  1. Home
  2. tech
  3. news
  4. The AI behind ChatGPT really does seem to be getting dumber — but no one can quite figure out why

The AI behind ChatGPT really does seem to be getting dumber — but no one can quite figure out why

Hasan Chowdhury   

The AI behind ChatGPT really does seem to be getting dumber — but no one can quite figure out why
Tech3 min read
  • It's not just you: new research suggests ChatGPT's AI model really is getting dumber.
  • A paper from Stanford and UC Berkeley scientists found GPT-4's performance had dropped recently.

There's been a growing feeling for a while now that the AI model behind ChatGPT is, frankly, getting dumber.

There's now some hard evidence to suggest that OpenAI's prized possession really might be losing some of its sheen.

A new paper published on Tuesday from researchers at Stanford University and UC Berkeley, exploring how ChatGPT's behavior has changed over time found the performance of the chatbot's underlying GPT-3.5 and GPT-4 AI models does, in fact, "vary greatly."

Not only does performance vary, but GPT-4, the more advanced "multimodal" model that can understand images as well as text, seems to have performed a whole lot worse over time in the tasks it was tested on.

These tasks were sufficiently varied to make sure the model was really being given a fair assessment of its capabilities: math problems, responses to sensitive questions, generating code, and visual reasoning were all part of the evaluation process.

But even with a variety of tasks to show its chops, GPT-4 came out looking pretty underwhelming.

It was found to have 97.6% accuracy in identifying prime numbers in March, compared with a shocking 2.4% in June; it turned out "more formatting mistakes in code generation" last month than it did earlier this year, and it was generally "less willing to answer sensitive questions."

No one can quite figure out why GPT-4 is changing

What the research doesn't seem to identify is why this performance drop has happened.

"The paper doesn't get at why the degradation in abilities is happening. We don't even know if OpenAI knows this is occuring," Ethan Mollick, a professor of innovation at Wharton tweeted in response to the paper.

If OpenAI hasn't picked up on it, many in the AI community certainly have. Roblox product lead Peter Yang noted in May that GPT-4's answers are generated faster than they were previously "but the quality seems worse."

"Perhaps OpenAI is trying to save costs," he tweeted.

OpenAI's developer forum, meanwhile, is hosting an ongoing debate about a decrease in the quality of responses.

As the AI model underlying a more advanced version of ChatGPT, one that paying subscribers get access to, that's a bit of a problem for OpenAI. Its most advanced large language model should be giving it an edge in an increasingly fierce competition with its rivals.

As my colleague Alistair Barr noted earlier this month, many in the AI community are putting the deteriorating quality of GPT-4 down to a "radical redesign" of the model.

OpenAI has pushed back on this idea, with Peter Welinder, VP of product at OpenAI, tweeting last week: "No, we haven't made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one."

He may want to rethink that position after seeing this research.

Matei Zaharia, chief technology officer at Databricks and associate professor of computer science at UC Berkeley — as well as one of the co-authors of the research paper — tweeted that it "definitely seems tricky to manage quality" of responses of AI models.

"I think the hard question is how well model developers themselves can detect such changes or prevent loss of some capabilities when tuning for new ones," he tweeted.

Some, like Princeton professor of computer science, Arvind Narayanan, have pointed out important caveats in GPT-4's defense.

In a Twitter thread, he notes that the degradations reported in the paper might be "somewhat peculiar" to the tasks GPT-4 was given to do, as well as the evaluation method used. With the code generation test, he notes that GPT-4 adds "non-code text to its output," but the authors don't evaluate the correctness of the code."

That said, it's hard to ignore the questions of quality surrounding GPT-4 when a whole community of AI devotees is asking them. OpenAI better be sure it has the answers.