Software

AI + ML

Anthropic delivers Claude 3.5 model – and a new way to work with chatbots

Fast, funny, visionary, sure ... anything that knocks OpenAI down a peg will do


Video OpenAI challenger Anthropic has delivered its latest model — Claude 3.5 Sonnet — and claimed it outperforms rivals on many tasks.

Anthropic delivered the model — the first release of the Claude 3.5 family — with a Thursday announcement in which the outfit claimed higher performance than OpenAI's GPT-4o, Google's Gemini 1.5 Pro, and an early snapshot of Meta's recently announced Llama3-400B models, using a variety of knowledge-based benchmarks depicted in the table below.

Anthropic claims the first entry in its Claude 3.5 model family already bests OpenAI and Google in a variety of knowledge-based benchmarks ... Click to enlarge

Anthropic, built by ex-OpenAI staff and others including former Register vulture Jack Clark, also contends that Claude 3.5 Sonnet, which we'll just call Sonnet 3.5 from here on out, has a better grasp of humor and is therefore easier to work with. That and other improvements, Anthropic claims, mean the model is more reliable when asked to implement complex instructions.

Below is a video from the San Francisco upstart demoing its tech.

The release also introduced a feature to the Claude.ai chatbot, called “Artifacts” that sends content produced by the program to a dedicated window that Anthropic described as “a dynamic workspace where they [users] can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.”

"In the near future, teams — and eventually entire organizations — will be able to securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate," Team Anthropic boasted.

This feature is no doubt helped by the fact Sonnet 3.5 maintains its predecessor's 200,000 token context window, which you can think of as the model's short-term memory.

Sonnet's vision processing powers have gained better abilities to pick out text from complex images, and to interpret graphs and charts. If Anthropic is to be believed, Sonnet 3.5 comes out on top in all but visual question-answering when pitted against GPT-4o and Gemini 1.5 in vision workloads.

When it comes to vision workloads, like interpreting graphics, Anthropic claims a leg up on ChatGPT and Gemini ... Click to enlarge

Safety and privacy remain central tenets for the startup, which has assigned its latest model an AI Safety Level of 2 (ASL-2). Anthropic associates higher scores with more dangerous capabilities. This rating refers to models that "show early signs of dangerous capabilities," such as the ability to teach someone how to create biological weapons, but falls short of providing information that a search engine couldn't.

To maintain the safety and privacy of its models, Anthropic also incorporated feedback from the UK's Artificial Intelligence Safety Institute, and Thorn – an org which specializes in protecting children online – to fine tune the model.

Sonnet 3.5 is available in Anthropic's web and mobile apps, while developers can integrate the model into their projects using APIs, Amazon Bedrock, or Google Vertex AI. API access will set you back $3 for every million input tokens and $15 for every million output tokens generated.

Anthropic plans to add more models to the Claude 3.5 family, with Haiku and Opus variants slated for later this year. The model builder has already begun work on its next-generation of AI models, which will integrate new features such as memory to further expand its capabilities.

As always with these LLMs, they do hallucinate and will get things wrong. They also do have their uses. YMMV. ®

PS: Away from the marketing and into the science, you may be interested in research Anthropic put out last month that described in fascinating detail the way in which its models work internally. The paper goes into the math with examples.

Send us news
11 Comments

AI-pushing Adobe says AI-shy office workers will love AI if it saves them time

knowledge workers, overwhelmed by knowledge tasks? We know what you need

The future of AI/ML depends on the reality of today – and it's not pretty

The return of Windows Recall is more than a bad flashback

Buying a PC for local AI? These are the specs that actually matter

If you guessed TOPS and FLOPS, that's only half right

Canadian artist wants Anthropic AI lawsuit corrected

Tim Boucher objects to the mischaracterization of his work in authors' copyright claim

AI firms propose 'personhood credentials' … to fight AI

It's going to take more than CAPTCHA to prove you're real

Have we stopped to think about what LLMs actually model?

Claims about much-hyped tech show flawed understanding of language and cognition, research argues

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

Faster than you can read? More like blink and you'll miss the hallucination

A quick guide to tool-calling in large language models

A few lines of Python is all it takes to get a model to use a calculator or even automate your hypervisor

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed

Dell's all-in bet on AI pays off in latest earnings

The term was mentioned over 140 times during the earnings call

Brit teachers are getting AI sidekicks to help with marking and lesson plans

Isn't the education system in enough trouble already?

Fintech outfit Klarna swaps humans for AI by not replacing departing workers

Insists it's not cutting jobs and pays harder-to-automate people more with AI savings