12 stories
·
0 followers

How to pass an AI coding benchmark: train on the questions

2 Shares

SWE-Bench Verified by OpenAI tests how well a model can solve real bugs in real Python code from GitHub.

These bugs are all public information — so the AI models have almost certainly trained on the actual text of the bug and on the fix for the bug.

In “The SWE-Bench Illusion,” researchers at Purdue and Microsoft try to check if various models memorise instead of reasoning. [arXiv, PDF]

They told the models the text of the issue and the name of the code repository for the program —  and nothing else — then asked the model for the path to the file that needs fixing.

The models usually gave the correct answer! But they could only have given the right answer if they had trained on the questions.

OpenAI’s o3‑mini scored 76% when they gave it just the issue text and the repository name. Now that’s what I call vibe-coding.

All public code on GitHub will be in the training data. When they tested on bugs not in SWE-Bench, the success rate dropped to 57‑71% on random items, and 50‑68% on fresh issues created after the benchmark snapshot. I’m surprised they did that well.

The researchers recommend better benchmarks “to ensure that reported progress reflects genuine advances in software engineering capabilities rather than dataset-specific artifacts.”

I don’t expect that to happen. The vendors will keep selling the code assistants to management on the dream of not paying programmers, with chatbot-generated LinkedIn posts saying how great AI codebots are. Real world performance never really mattered.

Read the whole story
garren
2 days ago
reply
Share this story
Delete

Editorial Template for Every Time the United States Goes to War. “President...

1 Share
Editorial Template for Every Time the United States Goes to War. “President [GUY WHO HAS COMMITTED MULTIPLE WAR CRIMES], a [REPUBLICAN / DEMOCRAT], sidestepped his [COMPLICIT / COWARDLY / TOTALLY INEPT] Congress…”
Read the whole story
garren
2 days ago
reply
Share this story
Delete

Google bribes iNaturalist to use generative AI — volunteers quit in outrage

1 Comment and 3 Shares

iNaturalist is a website that crowdsources pictures of plants and animals to help identify species. Its tagline is “A Community for Naturalists.”

iNaturalist is administered by its own small charity, but the work is done by a huge number of volunteer contributors — a bit like Wikipedia.

Sometimes a charity where volunteers do all the work forgets who does all the work and that these are volunteers, not minions. If someone waves a bit of money at them.

Every year, Google tries to launder its reputation by sending a bit of blood money to charities to greenwash its AI. This year’s round included $1.5 million to iNaturalist, who excitedly announced this on Twitter (and nowhere else). [Google; Twitter]

The volunteers — the ones who do all the work — were less than delighted. [iNat forum]

Two days later, iNaturalist explained the grant: [blog post]

By using generative AI (GenAI), we hope to synthesize information about how to distinguish different species and accurately convey that to iNaturalist users.

iNaturalist plans to use a Google chatbot to make up some hallucinations about data that had been uploaded by the volunteers. So how was this AI slop going to be fact-checked?

We will incorporate a feedback process for the AI-generated identification tips so that we can maintain high standards of accuracy.

That is, the volunteers would work for free to improve Google’s bot. This plan didn’t go down so well.

It turns out people do free work for knowledge because they hold principles and stuff. Many deleted their accounts — which also deletes their observations from iNaturalist — because they didn’t volunteer to feed a lying slop machine that’s an environmental disaster. And they no longer wanted anything to do with a charity so lost it didn’t see why this was not a good idea. [Scientific American]

iNaturalist has tried very hard to backpedal without backpedaling. Executive director Scott Loarie posted on the forum: [iNat forum]

I can assure you that I and the entire iNat team hates the AI slop that’s taking over the internet as much as you do.

… there’s no way we’re going to unleash AI generated slop onto the site.

Those are nice words, but AI-generated slop is still explicitly the plan. iNaturalist’s grant deliverable is “to have an initial demo available for select user testing by the end of 2025.”

You can tell what happened — Google promised iNaturalist free money if they would just do something, anything, that had some generative AI in it. iNaturalist forgot why people contribute at all, and took the cash.

The iNaturalist charity is currently “working on a response that should answer most of the major questions people have and provide more clarity.” [Twitter]

They’re sure the people who do the work for free hate this whole plan only because there’s not enough “clarity” — and not because it’s a terrible idea.

Read the whole story
garren
15 days ago
reply
Share this story
Delete
1 public comment
tante
12 days ago
reply
"You can tell what happened — Google promised iNaturalist free money if they would just do something, anything, that had some generative AI in it. iNaturalist forgot why people contribute at all, and took the cash."
Berlin/Germany
HarlandCorbin
12 days ago
Fuuu.... Wife and I use(d) this app to figure out some of the unknown plants that pop up in the garden and flower beds.

Shocked to hear ‘prompt engineer’ is not a real job

1 Comment and 2 Shares

Who remembers “prompt engineering”? Ask ChatGPT questions real good and pull down $200,000 a year!

In 2023, a Washington Post headline said: “Prompt engineers’ are being hired for their skill in getting AI systems to produce exactly what they want, and they make pretty good money.” With “no coding required”! Huge if true. [Washington Post, 2023, archive]

The Wall Street Journal also claimed you could make a bundle asking a bot questions! If you also knew data science, machine learning, and programming. [WSJ, 2023, archive]

Two years later, it turns out there are not high-paying jobs in asking bots questions good. You might think there never were. [WSJ, archive]

It “remains unclear whether companies were ever truly hiring for individually titled prompt engineers.” [Fast Company, archive]

“Prompt engineer” is out. The new titles are “AI trainer,” “AI data specialist” or “AI security specialist.” You might think those are just “trainer,” “data specialist” or “security specialist” but you also have to put up with your boss thinking the chatbot does anything.

“Prompt engineer” was a promotional claim for chatbots that no-one could verify. Every scam needs one.

The hype did not magic the jobs into existence. Because this was all part of marketing chatbots to the enterprise. They wanted companies to believe in the magic of chatbots.

But you should know by now – if it sounds too good to be true, it probably is. That applies to AI just as much as any previous scam that promises you can get something for nothing.

Read the whole story
garren
33 days ago
reply
It just moved from”prompt engineer” to “AI engineer”. Like Sales Engineer, Marketing Engineer, and, arguably, Software Engineering, they’re not recognized as Engineering disciplines (in the US), but they sound good. You can’t work in tech and _not_ have “engineer” in your title.
Share this story
Delete

Congratulations to Amazon on Its Partnership With the Saudi Prince Who Murdered...

1 Share
Congratulations to Amazon on Its Partnership With the Saudi Prince Who Murdered Jeff Bezos’ Employee and Hacked His Phone. A profile in rapacity & cowardice.
Read the whole story
garren
45 days ago
reply
Share this story
Delete

Do people know about Vocational Rehab? If you’re USAmerican they have this in every state.

1 Share

eldritchbauble:

Do people know about Vocational Rehab? If you’re USAmerican they have this in every state.

It’s a program that helps disabled folks access education, training, and employment. For FREE.

You only have to be disabled to qualify (autism, ADHD, mental illness, physical illness, etc) and they cover very broad categories of disability. You do NOT have to be officially diagnosed yet when you walk in - they will even help pay for your diagnosis if you are struggling w disability.

I applied with my suspected autism and fibromyalgia, and they paid for 100% of my formal autism assessment.

Once your disability is established they will give you career counseling to learn about your interests and skills, and depending on the plan you create with your caseworker they will then help with school or finding employment. They paid for 100% of my college tuition and books, and even provided a laptop for me to use.

You do not have to pay anything for this program. If you make above a certain income, you will have to contribute to educational costs but will still receive assistance.

They will also help with the cost of things like mental health counseling while you work towards your goals, clothing for interviews, etc.

They cannot discriminate based on your race, gender, or sexual orientation.

They won’t make you do excessive meetings.

They will allow you to do meetings with your caseworker remotely.

They will not drug test you.

They want you to succeed.

I’m sure that individual experiences vary but my caseworker was exceptionally easy to work with and very kind.

Vocational Rehab is a phenomenal resource every disabled person should be aware of. Here is the list of offices in every state:

Read the whole story
garren
356 days ago
reply
Share this story
Delete
Next Page of Stories