I've been in technology for a long time and I rarely get excited or surprised by anything. But shortly after Open AI's ChatGPT was released, I asked my wife to create a WordPress plugin for her e-commerce site. When it was done and the plugin worked, I was truly amazed.
That was the beginning of my deep dive into chatbots and AI-assisted programming. Since then, I have put 10 Large Scale Machine Models (LLMs) through 4 real-world tests.
How to write using ChatGPT: Resume | Excel formulas | Essays | Cover letters
Unfortunately, not all chatbots are coded the same: 18 months after my first tests, I still can’t create a working plugin for 5 of the 10 LLMs I tested.
In this article, I'll show you how each LLM performed in my testing. There are two chatbots I recommend, but they cost $20 per month. The free versions of the same chatbots work well enough that you probably won't need to pay for them. But the other chatbots, free or paid, aren't much better. Until they perform better, I wouldn't risk or recommend using them for your programming projects.
Also, How to Test Your AI Chatbot Coding Skills – You Can Too
I've written a lot about using AI to help with programming. An AI can't write an entire app or program unless it's a small, simple project like a plugin for my wife. But it's good at writing a few lines of code, and it's not bad at modifying code.
Rather than repeating everything I wrote, please read this article: How to write code using ChatGPT: What ChatGPT can and can't do.
If you want to understand my coding test, why I chose it, and why it’s relevant for 10 LLMs review, read this article: How to test your coding skills for an AI chatbot – so can you too.
First, let's compare chatbot performance.
David Gewirtz/ZDNET
Now let's look at each chatbot individually. The chart above shows 10 LLMs, but we'll discuss 9 chatbots here. Both GPT-4 and GPT-4o results are included in ChatGPT Plus. Are you ready? Let's get started.
Pros Passed all tests Solid coding results Mac app Cons Hallucinatory No Windows app yet Doesn't work sometimes Price: $20/month LLM: GPT-4o, GPT-4, GPT-3.5 Desktop browser interface: Yes Dedicated Mac app: Yes Dedicated Windows app: No Multi-factor authentication: Yes Tests passed: 4 of 4
Powered by GPT-4 and GPT-4o, ChatGPT Plus passed all tests. One feature I like is the availability of a dedicated app. When testing web programming, I set my browser in one screen, open my IDE, and run the ChatGPT Mac app on another screen.
I also ran GPT-4o through a coding test and it passed all but one odd result.
Additionally, Logitech's Prompt Builder, which pops up using the mouse button, can be configured to connect to your OpenAI account using the upgraded GPT-4o, allowing you to run prompts with just a tap of your thumb, which is super convenient.
The only thing I didn't like was that one of the GPT-4o tests gave me a multiple choice answer, one of which was wrong. I would have been happy if it had just given me the correct answer. Still, I could have done a quick test to see which answers were valid, but it was a bit tedious. I didn't have that issue with GPT-4, so for now, this is the LLM setting I use with ChatGPT when coding.
Pros Multiple LLMs View search criteria Good sources Cons Email-only login No desktop app Price: $20/month LLMs: GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, Llama 3.1 405B Desktop Browser Interface: Yes Dedicated Mac App: No Dedicated Windows App: No Multi-Factor Authentication: No Test Passed: 4/4
We seriously considered listing Perplexity Pro as the overall best AI chatbot for coding, but one shortcoming caused it to miss out on the top spot: how it logs in. Perplexity doesn't use usernames/passwords or passkeys, and it doesn't have multi-factor authentication — it just emails you a login pin. The AI also doesn't have a separate desktop app like ChatGPT offers for Mac.
What makes Perplexity different from other tools is that it allows you to run multiple LLMs. While you cannot configure LLM for a particular session, you can easily access the settings to select the active model.
Also, does Perplexity Pro help with coding? I got the highest score on my programming test thanks to GPT-4.
For programming, you'll probably stick with GPT-4o since it performed well in all tests. But it might be interesting to cross-check your code across different LLMs. For example, if you've written some regular expression code in GPT-4o, consider switching to a different LLM to see how it evaluates the generated code.
As we'll explain below, most LLMs are unreliable, so don't take the results as absolute, but you can use the results to get more information to review the original code – it's like an AI-driven code review.
Don't forget to change it back to GPT-4o.
Cons Rapid throttlingMay get disconnected in the middle of workPrice: FreeLLM: GPT-4o, GPT-3.5Desktop browser interface: YesDedicated Mac app: YesDedicated Windows app: NoMulti-factor authentication: YesTests passed: 3 out of 4 in GPT-3.5 mode
ChatGPT is free for everyone to use, and both the Plus and free editions support GPT-4o and passed all of my programming tests, but there are limitations when using the free app.
OpenAI treats free ChatGPT users as if they are on the cheap. During times of high traffic or busy servers, free ChatGPT only offers GPT-3.5 to free users. The tool only allows a certain number of queries before downgrading or shutting down.
Also, How to Use ChatGPT: What You Need to Know Now
There were a few times when I was told with the free version of ChatGPT that I was asking too many questions.
ChatGPT is a great tool, as long as you don't mind it shutting down every now and then. Even GPT-3.5 outperformed all other chatbots in the tests. The test that GPT-3.5 failed was the test of a fairly obscure programming tool written by a single programmer in Australia.
So, if budget is important and you can wait when disconnected, use ChatGPT for free.
Pros Free Passes most tests Wide variety of research tools Cons Limited to GPT-3.5 Adjust prompt results Price: Free LLM: GPT-3.5 Desktop browser interface: Yes Dedicated Mac app: No Dedicated Windows app: No Multi-factor authentication: No Tests passed: 3 of 4
We’re nitpicking here, but the free version of Perplexity AI is based on GPT-3.5, and its test results were clearly better than other AI chatbots.
Also, 5 reasons to prefer Perplexity over other AI chatbots
From a programming perspective, that's pretty much it. But from a research and organizational perspective, my ZDNET colleague Steven Vaughan-Nichols prefers Perplexity to other AIs.
He likes that Perplexity provides more complete sources for his research questions, cites sources, organizes answers, and provides questions for further search.
So, if you are doing other studies as well as programming, consider the free version of Perplexity.
Chatbots to avoid for programming assistance
I tested nine chatbots, and four of them passed most of the tests. The others, including one that was touted as being the best to program, only passed one test each. Microsoft's Copilot didn't pass any.
I mention them here because people will ask and because I've tested them thoroughly. Some of them work perfectly fine for other tasks, so if you're just curious about their capabilities, see my more general review.
Meta AI
David Gewirtz/ZDNET
Meta AI is Facebook's general AI. As mentioned above, it failed three out of four tests.
Also, How to get started with Meta AI on Facebook, Instagram, and more
The AI certainly produced a nice user interface, but it was completely devoid of functionality, and it also found a nasty bug that presented a pretty serious challenge: I was surprised that the AI failed on a simple regular expression challenge, given the specific knowledge required to find the bug. But it did!
Metacoderama
David Gewirtz/ZDNET
Meta Code Llama is an AI from Facebook designed specifically for coding assistance. You can download it and install it on your server. I tested it by running it on a Hugging Face AI instance.
Also, can you code Meta AI? I tested it against Llama, Gemini, and ChatGPT, and it came in way behind.
Oddly, both Meta AI and Meta Code Llama failed 3 out of 4 of my tests, but on different problems. While you can't expect an AI to give the same answer twice, this result was surprising. We'll have to wait and see if that changes over time.
Claude 3.5 Sonnet
David Gewirtz/ZDNET
Anthropic claims that the 3.5 Sonnet version of their Claude AI chatbot is perfect for programming. Since it failed all but one test, I don't think so.
If you aren't using it for programming, Claude may be a better choice than the free version of ChatGPT.
Also, here are 4 things that Claude AI can't do with ChatGPT:
According to my ZDNET colleague Maria Diaz, Claude can process uploaded files, process more words than the free version of ChatGPT, provide information that is about a year newer than GPT-3.5, and access websites.
Gemini Advance
David Gewirtz/ZDNET
Gemini Advanced is the $20 pro version of Google's Gemini (formerly Bard) chatbot. I was hoping this tool would do better than one of the four. Interestingly, it cleared the one test that all the AIs except GPT-4/4o failed: knowledge of a fairly obscure programming language written by a single programmer in Australia.
According to Google, Gemini Advanced has three advantages over other AI assistants:
So even if you know the language, why can't you handle basic regular expressions and other issues that beginner programmers deal with?
Microsoft Copilot
David Gewirtz/ZDNET
You'd think a company whose slogan is “Developers! Developers! Developers!” would be able to build an AI that would do better on programming tests. Microsoft makes some of the best coding tools in the world. But Copilot didn't fare so well.
Also, what are the different Copilots from Microsoft? Here's how they differ and how to use them.
The only good thing is that Microsoft always learns from its mistakes, so check back later to see if this has improved things.
It's just a matter of time
The results of my testing were pretty surprising, especially considering the huge investments from Microsoft and Google, but this area of innovation is moving forward at an incredible rate, so I will continue to update you with my testing and results. Stay tuned.
Have you ever used any of these AI chatbots for programming? What was your experience like? Let us know in the comments below.
You can follow our daily project updates on social media: subscribe to our weekly newsletter and follow us on Twitter/X. David GewirtzFacebook (Facebook.com/DavidGewirtz), Instagram (Instagram.com/DavidGewirtz) and YouTube (YouTube.com/DavidGewirtzTV).