scorecardThe world's most-powerful AI model suddenly got 'lazier' and 'dumber.' A radical redesign of OpenAI's GPT-4 could be behind the decline in performance.
  1. Home
  2. tech
  3. news
  4. The world's most-powerful AI model suddenly got 'lazier' and 'dumber.' A radical redesign of OpenAI's GPT-4 could be behind the decline in performance.

The world's most-powerful AI model suddenly got 'lazier' and 'dumber.' A radical redesign of OpenAI's GPT-4 could be behind the decline in performance.

Alistair Barr   

The world's most-powerful AI model suddenly got 'lazier' and 'dumber.' A radical redesign of OpenAI's GPT-4 could be behind the decline in performance.
Tech4 min read
  • GPT-4 was slow but accurate at first. And shockingly expensive to use and run.
  • Lately, the giant AI model has become faster, but performance has declined.

The world's most-powerful AI model has become, well, less powerful. That has industry insiders whispering about what may be a major redesign of the system.

In recent weeks, users of OpenAI's GPT-4 have been complaining loudly about degraded performance, with some calling the model "lazier" and "dumber" compared to its previous reasoning capabilities and other output.

In comments on Twitter and in OpenAI's online developer forum, users vented their frustrations with issues such as weakened logic, more erroneous responses, losing track of provided information, trouble following instructions, forgetting to add brackets in basic software code, and only remembering the most recent prompt.

"The current GPT4 is disappointing," wrote one developer who uses GPT-4 to help him code functions for his website. "It's like driving a Ferrari for a month then suddenly it turns into a beaten up old pickup. I'm not sure I want to pay for it."

Peter Yang, a product lead at Roblox, tweeted that the model was generating faster outputs but the quality was worse. "Just simple questions like making writing more clear and concise and generating ideas," he added. "The writing quality has gone down in my opinion." He asked if anyone else had noticed this.

"I've found it to be lazier," another user, Frazier MacLeod, replied.

Christi Kennedy wrote on OpenAI's developer forum that GPT-4 had started looping outputs of code and other information over and over again.

"It's braindead vs. before," she wrote last month. "If you aren't actually pushing it with what it could do previously, you wouldn't notice. Yet if you are really using it fully, you see it is obviously much dumber."

From slow and expensive, to fast and inaccurate

This is quite a change from earlier this year when OpenAI was wowing the world with ChatGPT and the tech industry awaited the launch of GPT-4 with rapt anticipation. ChatGPT originally ran on GPT-3 and GPT-3.5. These are the giant, underlying AI models that power its uncanny answers.

The larger GPT-4 was launched in March and quickly became the go-to model for developers and other tech industry insiders. It's considered the most-powerful AI model available broadly and is multimodal, which means it can understand images as well as text inputs.

After the initial rush to try out this new model, some were shocked by their bills for using GPT-4. It was slow but very accurate, according to Sharon Zhou, CEO of Lamini, a startup that helps developers build custom large language models.

The Ship of Theseus

That was the situation until a few weeks ago. Then GPT-4 got quicker, but the performance noticeably waned, fueling talk across the AI community that a major change was underway, according to Zhou and other experts.

They think OpenAI is creating several smaller GPT-4 models that act similarly to the large model but are less expensive to run.

The approach is called a Mixture of Experts, or MOE, according to Zhou. The smaller expert models are each trained on different tasks and subject areas. There could be a mini biologist GPT-4 and one for physics, chemistry, and so on. When a GPT-4 user asks a question, the new system knows which expert model to send that query to. The new system might decide to send a query to two or more of these expert models just in case, and then mash up the results.

"This idea has been around for a while and it's a natural next step," Zhou said.

Zhou compared this situation to the "Ship of Theseus" thought experiment where parts of the vessel were swapped out over time, begging the question, at what point does it become a whole new ship?

"OpenAI is taking GPT-4 and turning it into a fleet of smaller ships," she said. "From my perspective, it's a new model. Some would say it's the same."

Insider asked OpenAI about all this on Tuesday. The company, partly owned by Microsoft, did not respond.

This week, several AI experts posted what they claimed were details of GPT-4's architecture on Twitter. Yam Peleg, a startup founder, tweeted that OpenAI was able to keep costs down by using a mixture of experts model with 16 experts. Semianalysis wrote about the inner workings of GPT-4 this week.

George Hotz, a developer and hacker, described an "eight-way mixture model" for GPT-4 during a recent podcast. Soumith Chintala, who co-founded the PyTorch open-source AI project at Meta, weighed in on Hotz's comments.

"I would *conjecture* that the speculations are roughly accurate but I don't have confirmation," Oren Etzioni, founding CEO of the Allen Institute for AI, wrote in an email to Insider after seeing the leaks online this week.

There are two main technical reasons to use an MOE approach: better generated responses, and cheaper, faster responses, he explained.

"The 'right' mixture will give you both but often there is a tradeoff between cost and quality," Etzioni added. "In this case, it seems anecdotally that OpenAI is sacrificing some quality for reduced cost. These models are very hard to evaluate (what constitutes a better response? In what cases?) so this isn't scientific, it's anecdotal."

OpenAI wrote about the MOE approach in 2022 research co-authored by President Greg Brockman.

"With the Mixture-of-Experts (MoE) approach, only a fraction of the network is used to compute the output for any one input. One example approach is to have many sets of weights and the network can choose which set to use via a gating mechanism at inference time," Brockman and his colleague Lilian Weng wrote. "This enables many more parameters without increased computation cost. Each set of weights is referred to as 'experts,' in the hope that the network will learn to assign specialized computation and skills to each expert."

GPT-4's unnerving performance decline in recent weeks could well be related to this training and OpenAI rolling out this fleet of smaller expert GPT-4 models, Zhou said.

"When users test it, we are going to ask so many different questions. It won't do as well but it's collecting data from us and it will improve and learn," Zhou explained.