OK, hot take. Anthropic released a pretty hefty update for their Claude 3.5 Sonnet model at the end of October. And as we’re all about language (and AI) here at Definition, we thought we’d put the update through its paces and test things like creativity and tone.

The new version’s only been around for a few weeks. So yep, I’m ready to get some big yellow egg on my face here.

Excuses established, let me go on…

I’m going to say the updated version of 3.5 Sonnet is better at content, and the old version is better at tone.

Here’s what I made of both versions when I took their writing for a test run.

Name a thing

Claude 3.5 Sonnet old: 5/10
Claude 3.5 Sonnet new: 5/10

Let’s start with a fun one.

So I asked the models to name a British, vegan spread that comes in a cute black and yellow pot (the salty, dark spread that really divides opinions). Both 3.5 Sonnet versions guessed the product and put its actual name as the best choice (in a surprisingly self-effacing move, saying it was already really well named).

New Sonnet came up with some short and sensible names, based on concrete facts (Mighty Bite and Brewer’s Best weren’t bad). And the old Sonnet tackled the problem similarly with ideas that varied from the sublime Yum-ami, to the inappropriate Savoury Smear.

Out of 60 names, the AI probably came up with three that I might put in front of my team. (That’s about what I’d expect at this stage in the evolution of AI.)

And between the Sonnet versions? There wasn’t much in it really. In fact, they both came up with Yeast Beast, VitaSpread, Yeast Feast and a name that riffed on Divider. But then they do have the same parent model, so maybe it’s not surprising.

Creativity (write a sonnet*)

*see what I did there

Claude 3.5 Sonnet old: 6/10
Claude 3.5 Sonnet new: 7/10

“Hi, can you write me a sonnet based on Shall I compare thee?”, I asked the Sonnets, at the risk of sounding a little lonely.

“Can you write it in the Shakespearean format and the subject is making a cup of tea?” I said.

They both give me 14 lines in iambic pentameter in about three seconds.

I mean. Not bad.

New version Claude sends me its sonnet, rather considerately, as an attachment. It gives me a little summary of what it’s done, then asks me if I’d like it to ‘explain any of the poetic devices used or perhaps write another version with a different focus?’

Slick.

And the actual poem? It’s logical and rounds off quite cleverly with an analogy about the tea-making ‘ritual’ being right because it goes ‘From Western morn to the wisdom of the East’ (a bit clunky but there’s a really sound idea and imagery of the sun moving around the earth in there). Its phrasing also departs from the original, so you get something quite new.

Its technique is weird though. It evokes Shakespeare by over-writing, but not in a good way (it describes tea as ‘amber liquor’ – the sort of description Steve Coogan would have a field day with when writing as Alan Partridge in Big Beacon). It’s also chucked in some ‘swift’ and ‘hence’ and ‘pour I forths’ which feel forced instead of bard-like.

But hey, it’s a sonnet about a cup of tea.

So what about the older version?

It serves me up a big clod of text, so I have to press return 13 times to get the line breaks. But it is 14 lines in total…

It also sticks more closely to the original. Its images and words keep more of The OG’s feeling with talk of fleeting seasons, timelessness and setting suns, so tonally it retains the ‘mood’. But because it’s rehacking something else, it’s a bit nonsense-y in parts. And I think you’d be able to guess where it came from.

It tells me “the language attempts to mimic Shakespeare’s style, using archaic pronouns and verb forms (thou, thy, doth) and elevated, poetic diction” but I think it does that more subtly than the new version.

So I start thinking, maybe this new version of 3.5 Sonnet is better at content, and the older one is better at tone?

Summaries (make a long and boring thing short and clear)

Claude 3.5 Sonnet old: 6/10
Claude 3.5 Sonnet new: 5/10

No lonely sonnet requests this time.

Instead, I prompted each version to write me a short and clear exec summary of MoneySavingExpert.com’s editorial code. They both captured all the information – neither skipped anything critical. And neither of them hallucinated a bunch of nonsense (nice surprise – we’re still in the era where it very much feels like a human has to check both).

New Sonnet was slick. It wrote a short summary first time round and included more detail like dates and establishing facts.

But old Sonnet behaved more like a human writer by ordering what it told me, and putting the most important stuff first. I had to ask twice, but it got to a version that was shorter than the other rewrite too.

Now it didn’t have everything in it (neither did). But it certainly had the key stuff and was more digestible.

They both did well. But then I guess it’s not a surprise, since this kind of pattern spotting and digestion of information is exactly what generative AI is best at.

Tone of voice

Claude 3.5 Sonnet old: 7/10
Claude 3.5 Sonnet new: 5/10

The biggie. (For me, anyway.)

So how do they get on when I ask them to write a certain way? How prompt (ahem!) were their responses?

Feeling crafty, I asked the two versions to rewrite the first two paragraphs of Great Expectations as if they were a financial services firm – one known for its warm, open and accessible tone.

Let’s look at the results…

First thing’s first, they’re both a bit Americanised. And cringey (but for different reasons).

So what else?

The older version of 3.5 Sonnet is more human. It writes in the first person (which is also true to the original book). It’s not bad at metaphors either: “Looking at my dad’s tombstone, with its bold square letters, I imagined he was a strong, solid guy.” The credit goes to Dickens for the idea, but this version of Sonnet had the sense to keep the best part of it.

Downside? It makes SEVEN analogies to finance in two paragraphs, which is, er, less than subtle. Yes, it’s clunky – and it sounds emotionally insensitive in the context of the protagonist talking at the gravestones of his own family. But again, that’s first and foremost a content problem, right? (Saying a thing too much is making the tone wonk).

New version is smarter at bringing in a concept. It works on the financial angle more subtly and more smartly – it’s not shoe-horned in at every breath and it leads up to quite a clever comparison with finance at the end (a bit like it did with the Shakespearean sonnet). Decent content.

But maybe it’s trying too hard to be smart in places? And that’s at the cost of the tone. Its attempt at riffing here is tonally and emotionally misguided: “Pip had five little inheritances that didn’t make it – his five little brothers.” Now, it’s not a bad pun. (But it is in really bad taste.) And that’s tone.

It writes in the third-person, which is distancing. And then tries to over-compensate for that by a weirdly casual American tone (it says things like ‘super young’ and talks about ‘like way too many children’, as if written by Bill and Ted). Is it too casual, I was asking myself? Yes I think so – and that’s tone too.

Scores on the doors?

Claude 3.5 Sonnet old: 24/40 (the winner)
Claude 3.5 Sonnet new: 22/40

So when it comes to doing language tasks with Claude 3.5 Sonnet, should you use the older version or the most recent update?

The truth is, there’s not a lot in it. But I’d say, if you want something where content’s the priority, try the new version. If you’re a writer, or it’s tone you’re after, the older version still stands up to the updated Sonnet and I think it’s better at nuance. And if you want a version of Claude that writes exclusively in your company’s tone of voice, then drop us a line to find out more about our prompt engineering and fine-tuning services.

Curious to try Claude’s 3.5 Sonnet for work?

Why not try before you AI?

Sarah Webster Screen

Written by Sarah Webster, Senior Writer at Definition