2024 will undoubtedly be looked back on as the start of a groundbreaking decade: Artificial Intelligence has finally delivered on its early promises and has really arrived. Nowhere is this more evident than in the battle for LLMs, the large-scale language models at the heart of the revolution.
These LLMs are tools you use to access the power of AI on your computer, phone, and the web. They're typically used for everything from coding a new website to creating emails and presentations. You type or speak your questions and get the answers you need. It's like a web search on steroids.
Whether you’re an AI believer or a skeptic, there’s no denying that major changes are happening around the world as people and businesses adopt these tools to get serious about personal and business tasks.
Two of the main players at the forefront are OpenAI with its ChatGPT model, and Claude's Anthropic. For both companies, what surprised them most was how quickly Claude has progressed in its short history. Anthropic was founded in 2021 by former OpenAI executives and siblings Dario and Daniela Amodei to provide a “public utility” alternative to established AI companies at the time.
The company launched Claude LLM in 2023, touted as a “safe and trustworthy” model focused on avoiding the dangers of AI. Despite receiving more than $6 billion in investment pledges from Google and Amazon, the company's first model, Claude, was met with a lukewarm public response, as it was felt to be too restrictive for practical public use.
However, the release of Claude 3.5 Sonnet in June 2024 shook the AI world with its great utility and versatility across a wide range of uses. Suddenly, OpenAI found itself facing a formidable rival that many felt was superior to ChatGP, especially in terms of programming and general thought-chaining tasks.
All of this makes it worthy of being recognized as one of the world's leading large-scale language models.
Claude's Review: First Impressions
(Image courtesy of Claude)
Signing up for an Anthropic account is easy with Claude.ai. Log in with your email or Google account and you can start using the prompt box right away. The default free account has hard limits of 5 requests per minute and 300K tokens per day. This may seem like a lot, but it's very easy to max out these limits when you actually start iterating on your projects.
Essentially, if you want to do more than simple text tasks like summarizing and translating, you'll want to upgrade to the Pro plan, which costs $20 per month. This level allows 4000 requests per minute on a pay-as-you-go basis.
Another good option is to use a third-party app and the Claude API, which doesn't seem to have any obvious rate limits. I use TypingMind.com's API on a regular basis, on a PAYG token basis, and find it very useful. The only problem is that currently API users can't access Claude's Artifacts feature, but I hope this will be available soon.
Claude's Review: In Use
(Image courtesy of Claude)
An important thing to note is that Claude's world is split into two sections. Claude Chat (Claude.ai) is the public-facing chatbot that most people use. However, developers can also sign up for the console version, which offers more prompt management and engineering, but doesn't have the super-good Artifacts functionality. You can sign up for both with the same email, but they're separate for usage and billing purposes, which can get a little confusing.
For this review, we tried some testing using standard chat and Artifacts, an all-new feature that adds a WYSIWYG window next to the prompt window so you can see what the generated code is creating. It's a great way to see your creation come to life right before your eyes. The code behind the results is also available just a click or download away, making it easy to iterate and test your ideas until they're perfectly formed and ready to use.
Quick tip: The Artifacts feature is not turned on by default – you must turn it on manually by clicking on your account name in the bottom left of the Claude home screen and using the (Feature Preview) menu option.
The chat mode worked very well, being fast and accurate for simple tasks, but tended to struggle with more complex requirements. One great feature to mention is that if you make an error while iterating on an idea, you can simply copy and paste it into the Claude chat box and the AI will usually fix the issue instantly, which is super handy.
(Image courtesy of Claude)
For example, it took just a few seconds to create a YouTube comment analyzer web app using the YouTube API – in fact, it took longer to generate the YouTube API than it did to create the app, and the few iterations I did to refine the results were also straightforward.
(Image courtesy of Claude)
But when I tried to create a more complex interactive recipe app, pulling data from uploaded PDF files, things started to get complicated. But I knew exactly what the problem was: the prompt requests I was making were taking too long, and I was running out of context windows.
(Image courtesy of Claude)
A simple version of the app could be launched in minutes, but as soon as we started to improve it with more interactivity, we ran out of context space and Claude started making mistakes, which is a shame because things were going very well up until that point. With a little more time and more optimized prompts, we think this issue could have been avoided entirely.
(Image courtesy of Claude)
If I were a real-world coder for a living, I would have been able to continue and complete the work by hand, but as an avid amateur board gamer, I never got the chance. But it's clear that it won't be long before these LLMs are churning out games and apps on demand for anyone with the drive and a bit of desire.
We also wanted to test the console application, as this is one of the differentiators of the recently released product that Claude is clearly proud of. A very useful feature of the console is the Workbench, where you can test, evaluate, and enhance your prompts before using them in real life. In practice, the Workbench has proven to be a huge time and money saver: by testing different combinations of your suggested prompts and then spending credits, you can see the actual results and whether the model responds appropriately to your requests.
(Image courtesy of Claude)
Two standout features of Workbench are the ability to perform this detailed, multi-level testing, and the library of pre-made prompts that can speed up the entire production process. But the console's real purpose is to enable businesses to run teams and take control of their AI development, with features like the ability to easily invite and share collaborators, assign API keys, access reference documentation, and more.
(Image courtesy of Claude)
OpenAI offers a similar experience in Playground, which includes more features like fine-tuning and an assistant creator; however, I'm not sure it's as useful for most people's needs. For example, fine-tuning is often a last resort, because many completion issues can usually be resolved up front with improved prompt engineering and function calls. Also, assembling, cleaning, and organizing the relevant datasets is not as easy, which can hinder the effectiveness of fine-tuning in the first place.
(Image courtesy of Claude)
Either way, the Anthropic Workbench and Account Hub features are a testament to the company's commitment to the enterprise market. It's what separates LLM providers that simply offer a basic product from those that are focused on delivering a valuable AI ecosystem to their customers. The fact that you can get prompt code, track versions, and tweak everything from model settings to variables to system prompts makes this a nice and mature place to get real work done. Anthropic has built this side of their product offering well.
Claude's Review: Conclusion
AI, chatbots, and LLMs are still in their early stages, so keep that in mind as you read the review. We are witnessing the seeds of a true tech revolution, and we shouldn't expect miracles from day one. That said, it's incredible the work Anthropic has done over the past few months to make their products (especially the Claude 3.5 Sonnet) competitive in the marketplace. With this latest model, the company has positioned itself to lead in many areas, including co-pilot programming.
That's not to say that other models aren't equally or better in various application areas, but at the end of the day, people seem to prefer the understated quality of the Claude experience. From a personal perspective, 3.5 Sonnet is currently my preferred model for everyday use, which reflects how lackluster OpenAI's recent offerings have been. The race is just beginning, and I'm sure we'll soon see some amazing results from AI companies around the world. But until then, I'm happy to enjoy this impressive piece of American prose.