Is ChatGPT getting dumber?
Many users online have posed this exact question.
Namely, they’ve noticed that ChatGPT has slowed down and isn’t giving top-notch answers that left us all impressed in the beginning.
So, what has happened?
According to the report published by Stanford University last week:
- The ability to understand math has dropped by 95%;
- Coding skills have dropped by 42%;
- A steep drop in common sense and reasoning skills;
- ChatGPT has become “safer but less rational”.
More precisely, researchers have compared GPT versions from March 2023 and June 2023 and came to surprising conclusions that the older version indeed has better performances compared to the newer one in various tasks, such as answering sensitive questions or solving math problems. Additionally, it showed more formatting mistakes.
Our findings show that the behavior of the “same” LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLM quality.
Stanford University Report
An update doesn’t necessarily mean an upgrade
Now, if this is really true, we cannot know for sure because it’s a closed code. That’s why, as we pointed out in one of our previous articles, Meta making Llama 2 open-sourced may be one of the best decisions when it comes to LLM alternatives.
Either way, what we do know is that LLM can be updated over time based on data and feedback from users. In other words, constant criticisms regarding sensitive topics and so-called “ChatGPT” hallucinations may have resulted in it being more careful, giving calculated or vague responses.
It is also an interesting question whether an LLM service like GPT4 is consistently getting “better” over time. It’s important to know whether updates to the model aimed at improving some aspects actually hurt its capability in other dimensions.
Stanford University Report
But why would OpenAI want to do this?
Most probably, because they want to sell more subscriptions to ChatGPT Premium, as simple as that.
As explained in the AI newsletter Synthetic Mind:
The more users there are, the more ChatGPT power is shared. If OpenAI can make each account use slightly less power, they can sell more subscriptions. Lower quality answers = Less computing power = Less money.
Another reason that comes to mind is that OpenAI may be “saving” on AI chips, considering the ongoing AI race between the US and China, and the overall fact that the demand for AI chips is huge.
The company still hasn’t made an official statement regarding this topic.
In conclusion, the time span between March and June isn’t that long but the behavior between GPT-3.5 and GPT-4 has changed heavily. So, if you’ve noticed a decrease in response quality, it’s not that your standards are higher or that there’s something wrong with your prompts; it’s that the latest version is indeed weaker compared to older ones. At least according to this report, that is.