16 stories
·
0 followers

A McSweeney’s list that’s not that funny: How to Tell the Difference...

1 Share
A McSweeney’s list that’s not that funny: How to Tell the Difference Between a Lone Wolf and a Coordinated Effort by the Radical Left.
Read the whole story
garren
10 hours ago
reply
Share this story
Delete

Stiff Upper Lip

1 Share

During the Battle of Waterloo, a cannon shot struck the right leg of Henry Paget, Second Earl of Uxbridge, prompting this quintessentially British exchange:

Uxbridge: By God, sir, I’ve lost my leg!

Wellington: By God, sir, so you have!

That may be apocryphal, but the leg went on to a colorful career of its own.

The single most British conversation in the history of human civilization, in my judgment, took place on the Upper Nile in 1899, when starving explorer Ewart Grogan stumbled out of the bush and surprised one Captain Dunn, medical officer of a British exploratory expedition:

Dunn: How do you do?

Grogan: Oh, very fit thanks; how are you? Had any sport?

Dunn: Oh pretty fair, but there is nothing much here. Have a drink? You must be hungry; I’ll hurry on lunch.

“It was not until the two men had almost finished the meal that Dunn thought it excusable to enquire about the identity and provenance of his guest.”

Read the whole story
garren
3 days ago
reply
Share this story
Delete

New AI Malware PoC Reliably Evades Microsoft Defender

1 Share
Worried about hackers employing LLMs to write powerful malware? Using targeted reinforcement learning (RL) to train open source models in specific tasks has yielded the capability to do just that.

Read the whole story
garren
67 days ago
reply
Share this story
Delete

Calling in the AI vibe-writing cleanup crew

2 Shares

You’ve heard of vibe coding, with, I’m sure, not a little horror. Spin the gacha and make a website! And then you have unfortunate experiences.

There are people who get quietly called in to fix the vibe code disaster areas. They tend not to talk about it in public.

But the other thing that cheaparse clients think they can replace with a chatbot is copywriting.

And it turns out chatbot copy reads like slop, and it sucks, and it’s boring, and your eyes fall off it.

So copywriters, who can actually write are getting called in to fix chatbot vibe-writing. And a few of them have gone public. [BBC]

Sarah Skidd, in Arizona, was called in to fix some terrible chatbot website writing. She charged $100 an hour:

It was the kind of copy that you typically see in AI copy – just very basic; it wasn’t interesting. It was supposed to sell and intrigue but instead it was very vanilla.

Skidd now has a side business fixing these.

Sophie Warner at Create Designs in the UK is seeing the same thing. Clients cheap out with a chatbot and get bad text on a bad and insecure vibe-coded website code.

Warner says: “We are spending more time educating clients on the consequences.”

Where there’s muck, there’s brass. And sometimes the muck is toxic waste. And radioactive. So if you get called in to fix a vibe-slopchurned disaster, charge as much as you can. Then charge more than that.

 

Read the whole story
garren
70 days ago
reply
Share this story
Delete

How to pass an AI coding benchmark: train on the questions

2 Shares

SWE-Bench Verified by OpenAI tests how well a model can solve real bugs in real Python code from GitHub.

These bugs are all public information — so the AI models have almost certainly trained on the actual text of the bug and on the fix for the bug.

In “The SWE-Bench Illusion,” researchers at Purdue and Microsoft try to check if various models memorise instead of reasoning. [arXiv, PDF]

They told the models the text of the issue and the name of the code repository for the program —  and nothing else — then asked the model for the path to the file that needs fixing.

The models usually gave the correct answer! But they could only have given the right answer if they had trained on the questions.

OpenAI’s o3‑mini scored 76% when they gave it just the issue text and the repository name. Now that’s what I call vibe-coding.

All public code on GitHub will be in the training data. When they tested on bugs not in SWE-Bench, the success rate dropped to 57‑71% on random items, and 50‑68% on fresh issues created after the benchmark snapshot. I’m surprised they did that well.

The researchers recommend better benchmarks “to ensure that reported progress reflects genuine advances in software engineering capabilities rather than dataset-specific artifacts.”

I don’t expect that to happen. The vendors will keep selling the code assistants to management on the dream of not paying programmers, with chatbot-generated LinkedIn posts saying how great AI codebots are. Real world performance never really mattered.

Read the whole story
garren
75 days ago
reply
Share this story
Delete

Editorial Template for Every Time the United States Goes to War. “President...

1 Share
Editorial Template for Every Time the United States Goes to War. “President [GUY WHO HAS COMMITTED MULTIPLE WAR CRIMES], a [REPUBLICAN / DEMOCRAT], sidestepped his [COMPLICIT / COWARDLY / TOTALLY INEPT] Congress…”
Read the whole story
garren
76 days ago
reply
Share this story
Delete
Next Page of Stories